From bjourne at gmail.com Tue May 1 00:01:37 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Mon, 30 Apr 2007 22:01:37 +0000 Subject: [Python-3000] Traits/roles instead of ABCs In-Reply-To: <-3456230403858254882@unknownmsgid> References: <43aa6ff70704291840s3384824et44ebfd360c15eda@mail.gmail.com> <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> <1d36917a0704300816ma3bf9c2o4dd674cfcefa9172@mail.gmail.com> <-3456230403858254882@unknownmsgid> Message-ID: <740c3aec0704301501u7df7b5a6uaea854d4716eb87e@mail.gmail.com> On 4/30/07, Bill Janssen wrote: > > On 4/30/07, Raymond Hettinger wrote: > > > I'm concerned that the current ABC proposal will quickly evolve from optional > > > to required and create somewhat somewhat java-esque landscape where > > > inheritance and full-specification are the order of the day. > > > > +1 for preferring simple solutions to complex ones > > Me, too. But which is the simple solution? I tend to think ABCs are. Neither or. They are both an order of a magnitude more complex than the problem they are designed to solve. Raymond Hettingers small list of three example problems earlier in the thread, is the most concrete description of what the problem really is all about. And I would honestly rather sort them under "minor annoyances" than "really critical stuff, needs to be fixed asap." One really wise person wrote a long while ago (I'm paraphrasing) that each new feature should have to prove itself against the standard library. That is, a diff should be produced proving that real world Python code reads better with the proposed feature than without. If no such diff can be created, the feature probably isn't that useful. -- mvh Bj?rn From brett at python.org Tue May 1 00:31:20 2007 From: brett at python.org (Brett Cannon) Date: Mon, 30 Apr 2007 15:31:20 -0700 Subject: [Python-3000] Traits/roles instead of ABCs In-Reply-To: <740c3aec0704301501u7df7b5a6uaea854d4716eb87e@mail.gmail.com> References: <43aa6ff70704291840s3384824et44ebfd360c15eda@mail.gmail.com> <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> <1d36917a0704300816ma3bf9c2o4dd674cfcefa9172@mail.gmail.com> <-3456230403858254882@unknownmsgid> <740c3aec0704301501u7df7b5a6uaea854d4716eb87e@mail.gmail.com> Message-ID: On 4/30/07, BJ?rn Lindqvist wrote: [SNIP] > One really wise person wrote a long while ago (I'm paraphrasing) that > each new feature should have to prove itself against the standard > library. That is, a diff should be produced proving that real world > Python code reads better with the proposed feature than without. If no > such diff can be created, the feature probably isn't that useful. I think it would be a little difficult in this situation as since a similar mechanism does not currently exist in the stdlib and so most code is not written so that ABCs or roles are needed. Plus you have to find places of both LBYL and EAFP idioms if you did go with this. I guess you could look for files that use isinstance or catch AttributeError, respectively, but still. And thanks for calling Raymond "really wise"; gave me a chuckle (not because Raymond isn't smart but because he is not some old-timer who tells "back in the day" stories and thus doesn't fit the stereotypical "wise man" look). -Brett From l.mastrodomenico at gmail.com Tue May 1 00:36:07 2007 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Tue, 1 May 2007 00:36:07 +0200 Subject: [Python-3000] super() PEP In-Reply-To: <014901c78b6e$d2d66d80$0201a8c0@ryoko> References: <5.1.1.6.0.20070430142844.03c96240@sparrow.telecommunity.com> <011b01c78b6a$72098810$0201a8c0@ryoko> <43aa6ff70704301403q5bf557c2wf43148f7a339353d@mail.gmail.com> <014901c78b6e$d2d66d80$0201a8c0@ryoko> Message-ID: 2007/4/30, Tim Delaney : > Fine with me. Calvin - want to send me your latest draft, and I'll do some > modifications? I think we've got to the point now where we can take this > off-list. One more thing: what do people think of modifying super so that when it doesn't find a method instead of raising AttributeError it returns something like "lambda *args, **kwargs: None"? Optionally this can be a constant (e.g. default_method) defined somewhere so, if necessary, it's still possible to detect if the value of super.meth is a real method or the "fake" default_method. I think this can be useful when a method *doesn't know* if it's the last in the MRO because it may depend on the inheritance hierarchy of its subclasses: you can always simply call super.meth(...) and if the current method is the last this will be a NOP. -- Lino Mastrodomenico E-mail: l.mastrodomenico at gmail.com From jimjjewett at gmail.com Tue May 1 00:37:57 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 30 Apr 2007 18:37:57 -0400 Subject: [Python-3000] Revised PEPs 30XZ: remove implicit string concatenation and backslash continuation Message-ID: On 4/30/07, Guido van Rossum wrote: > I think these should be two separate proposals, with more specific > names (e.g. "remove implicit string concatenation" and "remove > backslash continuation"). There's no need to mention the octal thing > if it's already a separate PEP. Revised versions attached, as David Goodger seemed to prefer attachments. -jJ -------------- next part -------------- PEP: 30XZA Title: Remove Backslash Continuation Version: $Revision$ Last-Modified: $Date$ Author: Jim J. Jewett Status: Draft Type: Standards Track Content-Type: text/plain Created: 29-Apr-2007 Post-History: 29-Apr-2007, 30-Apr-2007 Abstract Python initially inherited its parsing from C. While this has been generally useful, there are some remnants which have been less useful for python, and should be eliminated. This PEP proposes elimination of terminal "\" as a marker for line continuation. Rationale for Removing Explicit Line Continuation A terminal "\" indicates that the logical line is continued on the following physical line (after whitespace). Note that a non-terminal "\" does not have this meaning, even if the only additional characters are invisible whitespace. (Python depends heavily on *visible* whitespace at the beginning of a line; it does not otherwise depend on *invisible* terminal whitespace.) Adding whitespace after a "\" will typically cause a syntax error rather than a silent bug, but it still isn't desirable. The reason to keep "\" is that occasionally code looks better with a "\" than with a () pair. assert True, ( "This Paren is goofy") But realistically, that parenthesis is no worse than a "\". The only advantage of "\" is that it is slightly more familiar to users of C-based languages. These same languages all also support line continuation with (), so reading code will not be a problem, and there will be one less rule to learn for people entirely new to programming. Alternate proposal Several people have suggested alternative ways of marking the line end. Most of these were rejected for not actually simplifying things. The one exception was to let any unfished expression signify a line continuation, possibly in conjunction with increased indentation assert True, # comma implies tuple implies continue "No goofy parens" The objections to this are: - The amount of whitespace may be contentious; expression continuation should not be confused with opening a new suite. - The "expression continuation" markers are not as clearly marked in Python as the grouping punctuation "(), [], {}" marks are. "abc" + # Plus needs another operand, so it continues "def" "abc" # String ends an expression, so + "def" # this is a syntax error. - Guido says so. [1] His reasoning is that it may not even be feasible. (See next reason.) - As a technical concern, supporting this would require allowing INDENT or DEDENT tokens anywhere, or at least in a widely expanded (and ill-defined) set of locations. While this is in some sense a concern only for the internal parsing implementation, it would be a major new source of complexity. [1] References [1] PEP 30XZ: Simplified Parsing, van Rossum http://mail.python.org/pipermail/python-3000/2007-April/007063.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -------------- next part -------------- PEP: 30xzB Title: Remove Implicit String Concatenation Version: $Revision$ Last-Modified: $Date$ Author: Jim J. Jewett Status: Draft Type: Standards Track Content-Type: text/plain Created: 29-Apr-2007 Post-History: 29-Apr-2007, 30-Apr-2007 Abstract Python initially inherited its parsing from C. While this has been generally useful, there are some remnants which have been less useful for python, and should be eliminated. This PEP proposes to eliminate Implicit String concatenation based on adjacency of literals. Instead of "abc" "def" == "abcdef" authors will need to be explicit, and add the strings "abc" + "def" == "abcdef" Rationale for Removing Implicit String Concatenation Implicit String concatentation can lead to confusing, or even silent, errors. def f(arg1, arg2=None): pass f("abc" "def") # forgot the comma, no warning ... # silently becomes f("abcdef", None) or, using the scons build framework, sourceFiles = [ 'foo.c' 'bar.c', #...many lines omitted... 'q1000x.c'] It's a common mistake to leave off a comma, and then scons complains that it can't find 'foo.cbar.c'. This is pretty bewildering behavior even if you *are* a Python programmer, and not everyone here is. [1] Note that in C, the implicit concatenation is more justified; there is no other way to join strings without (at least) a function call. In Python, strings are objects which support the __add__ operator; it is possible to write: "abc" + "def" Because these are literals, this addition can still be optimized away by the compiler. (The CPython compiler already does. [2]) Guido indicated [2] that this change should be handled by PEP, because there were a few edge cases with other string operators, such as the %. (Assuming that str % stays -- it may be eliminated in favor of PEP 3101 -- Advanced String Formatting. [3] [4]) The resolution is to treat them the same as today. ("abc %s def" + "ghi" % var) # fails like today. # raises TypeError because of # precedence. (% before +) ("abc" + "def %s ghi" % var) # works like today; precedence makes # the optimization more difficult to # recognize, but does not change the # semantics. ("abc %s def" + "ghi") % var # works like today, because of # precedence: () before % # CPython compiler can already # add the literals at compile-time. References [1] Implicit String Concatenation, Jewett, Orendorff http://mail.python.org/pipermail/python-ideas/2007-April/000397.html [2] Reminder: Py3k PEPs due by April, Hettinger, van Rossum http://mail.python.org/pipermail/python-3000/2007-April/006563.html [3] PEP 3101, Advanced String Formatting, Talin http://www.python.org/peps/pep-3101.html [4] ps to question Re: Need help completing ABC pep, van Rossum http://mail.python.org/pipermail/python-3000/2007-April/006737.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From guido at python.org Tue May 1 00:47:19 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Apr 2007 15:47:19 -0700 Subject: [Python-3000] octal literals PEP In-Reply-To: References: Message-ID: The PEP editors have admitted to being behind on the job. AFAIK PEPs sent to the PEP editors before the deadline are in, regardless of when the PEP goes online. To save the PEP editors the effort, if you send it to me I will assign it a PEP number and submit it. (Ditto for other PEPs in the same situation.) --Guido On 4/30/07, Patrick Maupin wrote: > I sent an email with an initial PEP to the PEP editors a few weeks > ago. Never got a reply. I noticed some traffic about this recently > but was too busy to follow it really carefully. > > Pat > > On 4/30/07, Jim Jewett wrote: > > On 4/30/07, Guido van Rossum wrote: > > > I think these should be two separate proposals, with more specific > > > names (e.g. "remove implicit string concatenation" and "remove > > > backslash continuation"). There's no need to mention the octal thing > > > if it's already a separate PEP. > > > > Patrick > > > > Guido had set an Apr 30 deadline for Py3000 PEPs that can't be > > implemented in pure python. > > > > Are you still working on the "Integer literal syntax and radices ", > > which included the octal literal? I would much prefer to leave octal > > literals with the rest of that PEP, (and to let you do it :D), but I > > will submit a much-simplified "023 raises SyntaxError" if you have > > abandoned the rest. > > > > -jJ > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Tue May 1 00:54:33 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 30 Apr 2007 18:54:33 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. Message-ID: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> This is just the first draft (also checked into SVN), and doesn't include the details of how the extension API works (so that third-party interfaces and generic functions can interoperate using the same decorators, annotations, etc.). Comments and questions appreciated, as it'll help drive better explanations of both the design and rationales. I'm usually not that good at guessing what other people will want to know (or are likely to misunderstand) until I get actual questions. PEP: 3124 Title: Overloading, Generic Functions, Interfaces, and Adaptation Version: $Revision: 55029 $ Last-Modified: $Date: 2007-04-30 18:48:06 -0400 (Mon, 30 Apr 2007) $ Author: Phillip J. Eby Discussions-To: Python 3000 List Status: Draft Type: Standards Track Requires: 3107, 3115, 3119 Replaces: 245, 246 Content-Type: text/x-rst Created: 28-Apr-2007 Post-History: 30-Apr-2007 Abstract ======== This PEP proposes a new standard library module, ``overloading``, to provide generic programming features including dynamic overloading (aka generic functions), interfaces, adaptation, method combining (ala CLOS and AspectJ), and simple forms of aspect-oriented programming. The proposed API is also open to extension; that is, it will be possible for library developers to implement their own specialized interface types, generic function dispatchers, method combination algorithms, etc., and those extensions will be treated as first-class citizens by the proposed API. The API will be implemented in pure Python with no C, but may have some dependency on CPython-specific features such as ``sys._getframe`` and the ``func_code`` attribute of functions. It is expected that e.g. Jython and IronPython will have other ways of implementing similar functionality (perhaps using Java or C#). Rationale and Goals =================== Python has always provided a variety of built-in and standard-library generic functions, such as ``len()``, ``iter()``, ``pprint.pprint()``, and most of the functions in the ``operator`` module. However, it currently: 1. does not have a simple or straightforward way for developers to create new generic functions, 2. does not have a standard way for methods to be added to existing generic functions (i.e., some are added using registration functions, others require defining ``__special__`` methods, possibly by monkeypatching), and 3. does not allow dispatching on multiple argument types (except in a limited form for arithmetic operators, where "right-hand" (``__r*__``) methods can be used to do two-argument dispatch. In addition, it is currently a common anti-pattern for Python code to inspect the types of received arguments, in order to decide what to do with the objects. For example, code may wish to accept either an object of some type, or a sequence of objects of that type. Currently, the "obvious way" to do this is by type inspection, but this is brittle and closed to extension. A developer using an already-written library may be unable to change how their objects are treated by such code, especially if the objects they are using were created by a third party. Therefore, this PEP proposes a standard library module to address these, and related issues, using decorators and argument annotations (PEP 3107). The primary features to be provided are: * a dynamic overloading facility, similar to the static overloading found in languages such as Java and C++, but including optional method combination features as found in CLOS and AspectJ. * a simple "interfaces and adaptation" library inspired by Haskell's typeclasses (but more dynamic, and without any static type-checking), with an extension API to allow registering user-defined interface types such as those found in PyProtocols and Zope. * a simple "aspect" implementation to make it easy to create stateful adapters and to do other stateful AOP. These features are to be provided in such a way that extended implementations can be created and used. For example, it should be possible for libraries to define new dispatching criteria for generic functions, and new kinds of interfaces, and use them in place of the predefined features. For example, it should be possible to use a ``zope.interface`` interface object to specify the desired type of a function argument, as long as the ``zope.interface`` package registered itself correctly (or a third party did the registration). In this way, the proposed API simply offers a uniform way of accessing the functionality within its scope, rather than prescribing a single implementation to be used for all libraries, frameworks, and applications. User API ======== The overloading API will be implemented as a single module, named ``overloading``, providing the following features: Overloading/Generic Functions ----------------------------- The ``@overload`` decorator allows you to define alternate implementations of a function, specialized by argument type(s). A function with the same name must already exist in the local namespace. The existing function is modified in-place by the decorator to add the new implementation, and the modified function is returned by the decorator. Thus, the following code:: from overloading import overload from collections import Iterable def flatten(ob): """Flatten an object to its component iterables""" yield ob @overload def flatten(ob: Iterable): for o in ob: for ob in flatten(o): yield ob @overload def flatten(ob: basestring): yield ob creates a single ``flatten()`` function whose implementation roughly equates to:: def flatten(ob): if isinstance(ob, basestring) or not isinstance(ob, Iterable): yield ob else: for o in ob: for ob in flatten(o): yield ob **except** that the ``flatten()`` function defined by overloading remains open to extension by adding more overloads, while the hardcoded version cannot be extended. For example, if someone wants to use ``flatten()`` with a string-like type that doesn't subclass ``basestring``, they would be out of luck with the second implementation. With the overloaded implementation, however, they can either write this:: @overload def flatten(ob: MyString): yield ob or this (to avoid copying the implementation):: from overloading import RuleSet RuleSet(flatten).copy_rules((basestring,), (MyString,)) (Note also that, although PEP 3119 proposes that it should be possible for abstract base classes like ``Iterable`` to allow classes like ``MyString`` to claim subclass-hood, such a claim is *global*, throughout the application. In contrast, adding a specific overload or copying a rule is specific to an individual function, and therefore less likely to have undesired side effects.) ``@overload`` vs. ``@when`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``@overload`` decorator is a common-case shorthand for the more general ``@when`` decorator. It allows you to leave out the name of the function you are overloading, at the expense of requiring the target function to be in the local namespace. It also doesn't support adding additional criteria besides the ones specified via argument annotations. The following function definitions have identical effects, except for name binding side-effects (which will be described below):: @overload def flatten(ob: basestring): yield ob @when(flatten) def flatten(ob: basestring): yield ob @when(flatten) def flatten_basestring(ob: basestring): yield ob @when(flatten, (basestring,)) def flatten_basestring(ob): yield ob The first definition above will bind ``flatten`` to whatever it was previously bound to. The second will do the same, if it was already bound to the ``when`` decorator's first argument. If ``flatten`` is unbound or bound to something else, it will be rebound to the function definition as given. The last two definitions above will always bind ``flatten_basestring`` to the function definition as given. Using this approach allows you to both give a method a descriptive name (often useful in tracebacks!) and to reuse the method later. Except as otherwise specified, all ``overloading`` decorators have the same signature and binding rules as ``@when``. They accept a function and an optional "predicate" object. The default predicate implementation is a tuple of types with positional matching to the overloaded function's arguments. However, an arbitrary number of other kinds of of predicates can be created and registered using the `Extension API`_, and will then be usable with ``@when`` and other decorators created by this module (like ``@before``, ``@after``, and ``@around``). Method Combination and Overriding --------------------------------- When an overloaded function is invoked, the implementation with the signature that *most specifically matches* the calling arguments is the one used. If no implementation matches, a ``NoApplicableMethods`` error is raised. If more than one implementation matches, but none of the signatures are more specific than the others, an ``AmbiguousMethods`` error is raised. For example, the following pair of implementations are ambiguous, if the ``foo()`` function is ever called with two integer arguments, because both signatures would apply, but neither signature is more *specific* than the other (i.e., neither implies the other):: def foo(bar:int, baz:object): pass @overload def foo(bar:object, baz:int): pass In contrast, the following pair of implementations can never be ambiguous, because one signature always implies the other; the ``int/int`` signature is more specific than the ``object/object`` signature:: def foo(bar:object, baz:object): pass @overload def foo(bar:int, baz:int): pass A signature S1 implies another signature S2, if whenever S1 would apply, S2 would also. A signature S1 is "more specific" than another signature S2, if S1 implies S2, but S2 does not imply S1. Although the examples above have all used concrete or abstract types as argument annotations, there is no requirement that the annotations be such. They can also be "interface" objects (discussed in the `Interfaces and Adaptation`_ section), including user-defined interface types. (They can also be other objects whose types are appropriately registered via the `Extension API`_.) Proceeding to the "Next" Method ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If the first parameter of an overloaded function is named ``__proceed__``, it will be passed a callable representing the next most-specific method. For example, this code:: def foo(bar:object, baz:object): print "got objects!" @overload def foo(__proceed__, bar:int, baz:int): print "got integers!" return __proceed__(bar, baz) Will print "got integers!" followed by "got objects!". If there is no next most-specific method, ``__proceed__`` will be bound to a ``NoApplicableMethods`` instance. When called, a new ``NoApplicableMethods`` instance will be raised, with the arguments passed to the first instance. Similarly, if the next most-specific methods have ambiguous precedence with respect to each other, ``__proceed__`` will be bound to an ``AmbiguousMethods`` instance, and if called, it will raise a new instance. Thus, a method can either check if ``__proceed__`` is an error instance, or simply invoke it. The ``NoApplicableMethods`` and ``AmbiguousMethods`` error classes have a common ``DispatchError`` base class, so ``isinstance(__proceed__, overloading.DispatchError)`` is sufficient to identify whether ``__proceed__`` can be safely called. (Implementation note: using a magic argument name like ``__proceed__`` could potentially be replaced by a magic function that would be called to obtain the next method. A magic function, however, would degrade performance and might be more difficult to implement on non-CPython platforms. Method chaining via magic argument names, however, can be efficiently implemented on any Python platform that supports creating bound methods from functions -- one simply recursively binds each function to be chained, using the following function or error as the ``im_self`` of the bound method.) "Before" and "After" Methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In addition to the simple next-method chaining shown above, it is sometimes useful to have other ways of combining methods. For example, the "observer pattern" can sometimes be implemented by adding extra methods to a function, that execute before or after the normal implementation. To support these use cases, the ``overloading`` module will supply ``@before``, ``@after``, and ``@around`` decorators, that roughly correspond to the same types of methods in the Common Lisp Object System (CLOS), or the corresponding "advice" types in AspectJ. Like ``@when``, all of these decorators must be passed the function to be overloaded, and can optionally accept a predicate as well:: def begin_transaction(db): print "Beginning the actual transaction" @before(begin_transaction) def check_single_access(db: SingletonDB): if db.inuse: raise TransactionError("Database already in use") @after(begin_transaction) def start_logging(db: LoggableDB): db.set_log_level(VERBOSE) ``@before`` and ``@after`` methods are invoked either before or after the main function body, and are *never considered ambiguous*. That is, it will not cause any errors to have multiple "before" or "after" methods with identical or overlapping signatures. Ambiguities are resolved using the order in which the methods were added to the target function. "Before" methods are invoked most-specific method first, with ambiguous methods being executed in the order they were added. All "before" methods are called before any of the function's "primary" methods (i.e. normal ``@overload`` methods) are executed. "After" methods are invoked in the *reverse* order, after all of the function's "primary" methods are executed. That is, they are executed least-specific methods first, with ambiguous methods being executed in the reverse of the order in which they were added. The return values of both "before" and "after" methods are ignored, and any uncaught exceptions raised by *any* methods (primary or other) immediately end the dispatching process. "Before" and "after" methods cannot have ``__proceed__`` arguments, as they are not responsible for calling any other methods. They are simply called as a notification before or after the primary methods. Thus, "before" and "after" methods can be used to check or establish preconditions (e.g. by raising an error if the conditions aren't met) or to ensure postconditions, without needing to duplicate any existing functionality. "Around" Methods ~~~~~~~~~~~~~~~~ The ``@around`` decorator declares a method as an "around" method. "Around" methods are much like primary methods, except that the least-specific "around" method has higher precedence than the most-specific "before" or method. Unlike "before" and "after" methods, however, "Around" methods *are* responsible for calling their ``__proceed__`` argument, in order to continue the invocation process. "Around" methods are usually used to transform input arguments or return values, or to wrap specific cases with special error handling or try/finally conditions, e.g.:: @around(commit_transaction) def lock_while_committing(__proceed__, db: SingletonDB): with db.global_lock: return __proceed__(db) They can also be used to replace the normal handling for a specific case, by *not* invoking the ``__proceed__`` function. The ``__proceed__`` given to an "around" method will either be the next applicable "around" method, a ``DispatchError`` instance, or a synthetic method object that will call all the "before" methods, followed by the primary method chain, followed by all the "after" methods, and return the result from the primary method chain. Thus, just as with normal methods, ``__proceed__`` can be checked for ``DispatchError``-ness, or simply invoked. The "around" method should return the value returned by ``__proceed__``, unless of course it wishes to modify or replace it with a different return value for the function as a whole. Custom Combinations ~~~~~~~~~~~~~~~~~~~ The decorators described above (``@overload``, ``@when``, ``@before``, ``@after``, and ``@around``) collectively implement what in CLOS is called the "standard method combination" -- the most common patterns used in combining methods. Sometimes, however, an application or library may have use for a more sophisticated type of method combination. For example, if you would like to have "discount" methods that return a percentage off, to be subtracted from the value returned by the primary method(s), you might write something like this:: from overloading import always_overrides, merge_by_default from overloading import Around, Before, After, Method, MethodList class Discount(MethodList): """Apply return values as discounts""" def __call__(self, *args, **kw): retval = self.tail(*args, **kw) for sig, body in self.sorted(): retval -= retval * body(*args, **kw) return retval # merge discounts by priority merge_by_default(Discount) # discounts have precedence over before/after/primary methods always_overrides(Discount, Before) always_overrides(Discount, After) always_overrides(Discount, Method) # but not over "around" methods always_overrides(Around, Discount) # Make a decorator called "discount" that works just like the # standard decorators... discount = Discount.make_decorator('discount') # and now let's use it... def price(product): return product.list_price @discount(price) def ten_percent_off_shoes(product: Shoe) return Decimal('0.1') Similar techniques can be used to implement a wide variety of CLOS-style method qualifiers and combination rules. The process of creating custom method combination objects and their corresponding decorators is described in more detail under the `Extension API`_ section. Note, by the way, that the ``@discount`` decorator shown will work correctly with any new predicates defined by other code. For example, if ``zope.interface`` were to register its interface types to work correctly as argument annotations, you would be able to specify discounts on the basis of its interface types, not just classes or ``overloading``-defined interface types. Similarly, if a library like RuleDispatch or PEAK-Rules were to register an appropriate predicate implementation and dispatch engine, one would then be able to use those predicates for discounts as well, e.g.:: from somewhere import Pred # some predicate implementation @discount( price, Pred("isinstance(product,Shoe) and" " product.material.name=='Blue Suede'") ) def forty_off_blue_suede_shoes(product): return Decimal('0.4') The process of defining custom predicate types and dispatching engines is also described in more detail under the `Extension API`_ section. Overloading Inside Classes -------------------------- All of the decorators above have a special additional behavior when they are directly invoked within a class body: the first parameter (other than ``__proceed__``, if present) of the decorated function will be treated as though it had an annotation equal to the class in which it was defined. That is, this code:: class And(object): # ... @when(get_conjuncts) def __conjuncts(self): return self.conjuncts produces the same effect as this (apart from the existence of a private method):: class And(object): # ... @when(get_conjuncts) def get_conjuncts_of_and(ob: And): return ob.conjuncts This behavior is both a convenience enhancement when defining lots of methods, and a requirement for safely distinguishing multi-argument overloads in subclasses. Consider, for example, the following code:: class A(object): def foo(self, ob): print "got an object" @overload def foo(__proceed__, self, ob:Iterable): print "it's iterable!" return __proceed__(self, ob) class B(A): foo = A.foo # foo must be defined in local namespace @overload def foo(__proceed__, self, ob:Iterable): print "B got an iterable!" return __proceed__(self, ob) Due to the implicit class rule, calling ``B().foo([])`` will print "B got an iterable!" followed by "it's iterable!", and finally, "got an object", while ``A().foo([])`` would print only the messages defined in ``A``. Conversely, without the implicit class rule, the two "Iterable" methods would have the exact same applicability conditions, so calling either ``A().foo([])`` or ``B().foo([])`` would result in an ``AmbiguousMethods`` error. It is currently an open issue to determine the best way to implement this rule in Python 3.0. Under Python 2.x, a class' metaclass was not chosen until the end of the class body, which means that decorators could insert a custom metaclass to do processing of this sort. (This is how RuleDispatch, for example, implements the implicit class rule.) PEP 3115, however, requires that a class' metaclass be determined *before* the class body has executed, making it impossible to use this technique for class decoration any more. At this writing, discussion on this issue is ongoing. Interfaces and Adaptation ------------------------- The ``overloading`` module provides a simple implementation of interfaces and adaptation. The following example defines an ``IStack`` interface, and declares that ``list`` objects support it:: from overloading import abstract, Interface class IStack(Interface): @abstract def push(self, ob) """Push 'ob' onto the stack""" @abstract def pop(self): """Pop a value and return it""" when(IStack.push, (list, object))(list.append) when(IStack.pop, (list,))(list.pop) mylist = [] mystack = IStack(mylist) mystack.push(42) assert mystack.pop()==42 The ``Interface`` class is a kind of "universal adapter". It accepts a single argument: an object to adapt. It then binds all its methods to the target object, in place of itself. Thus, calling ``mystack.push(42``) is the same as calling ``IStack.push(mylist, 42)``. The ``@abstract`` decorator marks a function as being abstract: i.e., having no implementation. If an ``@abstract`` function is called, it raises ``NoApplicableMethods``. To become executable, overloaded methods must be added using the techniques previously described. (That is, methods can be added using ``@when``, ``@before``, ``@after``, ``@around``, or any custom method combination decorators.) In the example above, the ``list.append`` method is added as a method for ``IStack.push()`` when its arguments are a list and an arbitrary object. Thus, ``IStack.push(mylist, 42)`` is translated to ``list.append(mylist, 42)``, thereby implementing the desired operation. (Note: the ``@abstract`` decorator is not limited to use in interface definitions; it can be used anywhere that you wish to create an "empty" generic function that initially has no methods. In particular, it need not be used inside a class.) Subclassing and Re-assembly ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Interfaces can be subclassed:: class ISizedStack(IStack): @abstract def __len__(self): """Return the number of items on the stack""" # define __len__ support for ISizedStack when(ISizedStack.__len__, (list,))(list.__len__) Or assembled by combining functions from existing interfaces:: class Sizable(Interface): __len__ = ISizedStack.__len__ # list now implements Sizable as well as ISizedStack, without # making any new declarations! A class can be considered to "adapt to" an interface at a given point in time, if no method defined in the interface is guaranteed to raise a ``NoApplicableMethods`` error if invoked on an instance of that class at that point in time. In normal usage, however, it is "easier to ask forgiveness than permission". That is, it is easier to simply use an interface on an object by adapting it to the interface (e.g. ``IStack(mylist)``) or invoking interface methods directly (e.g. ``IStack.push(mylist, 42)``), than to try to figure out whether the object is adaptable to (or directly implements) the interface. Implementing an Interface in a Class ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It is possible to declare that a class directly implements an interface, using the ``declare_implementation()`` function:: from overloading import declare_implementation class Stack(object): def __init__(self): self.data = [] def push(self, ob): self.data.append(ob) def pop(self): return self.data.pop() declare_implementation(IStack, Stack) The ``declare_implementation()`` call above is roughly equivalent to the following steps:: when(IStack.push, (Stack,object))(lambda self, ob: self.push(ob)) when(IStack.pop, (Stack,))(lambda self, ob: self.pop()) That is, calling ``IStack.push()`` or ``IStack.pop()`` on an instance of any subclass of ``Stack``, will simply delegate to the actual ``push()`` or ``pop()`` methods thereof. For the sake of efficiency, calling ``IStack(s)`` where ``s`` is an instance of ``Stack``, **may** return ``s`` rather than an ``IStack`` adapter. (Note that calling ``IStack(x)`` where ``x`` is already an ``IStack`` adapter will always return ``x`` unchanged; this is an additional optimization allowed in cases where the adaptee is known to *directly* implement the interface, without adaptation.) For convenience, it may be useful to declare implementations in the class header, e.g.:: class Stack(metaclass=Implementer, implements=IStack): ... Instead of calling ``declare_implementation()`` after the end of the suite. Interfaces as Type Specifiers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``Interface`` subclasses can be used as argument annotations to indicate what type of objects are acceptable to an overload, e.g.:: @overload def traverse(g: IGraph, s: IStack): g = IGraph(g) s = IStack(s) # etc.... Note, however, that the actual arguments are *not* changed or adapted in any way by the mere use of an interface as a type specifier. You must explicitly cast the objects to the appropriate interface, as shown above. Note, however, that other patterns of interface use are possible. For example, other interface implementations might not support adaptation, or might require that function arguments already be adapted to the specified interface. So the exact semantics of using an interface as a type specifier are dependent on the interface objects you actually use. For the interface objects defined by this PEP, however, the semantics are as described above. An interface I1 is considered "more specific" than another interface I2, if the set of descriptors in I1's inheritance hierarchy are a proper superset of the descriptors in I2's inheritance hierarchy. So, for example, ``ISizedStack`` is more specific than both ``ISizable`` and ``ISizedStack``, irrespective of the inheritance relationships between these interfaces. It is purely a question of what operations are included within those interfaces -- and the *names* of the operations are unimportant. Interfaces (at least the ones provided by ``overloading``) are always considered less-specific than concrete classes. Other interface implementations can decide on their own specificity rules, both between interfaces and other interfaces, and between interfaces and classes. Non-Method Attributes in Interfaces ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``Interface`` implementation actually treats all attributes and methods (i.e. descriptors) in the same way: their ``__get__`` (and ``__set__`` and ``__delete__``, if present) methods are called with the wrapped (adapted) object as "self". For functions, this has the effect of creating a bound method linking the generic function to the wrapped object. For non-function attributes, it may be easiest to specify them using the ``property`` built-in, and the corresponding ``fget``, ``fset``, and ``fdel`` attributes:: class ILength(Interface): @property @abstract def length(self): """Read-only length attribute""" # ILength(aList).length == list.__len__(aList) when(ILength.length.fget, (list,))(list.__len__) Alternatively, methods such as ``_get_foo()`` and ``_set_foo()`` may be defined as part of the interface, and the property defined in terms of those methods, but this a bit more difficult for users to implement correctly when creating a class that directly implements the interface, as they would then need to match all the individual method names, not just the name of the property or attribute. Aspects ------- The adaptation system provided assumes that adapters are "stateless", which is to say that adapters have no attributes or storage apart from those of the adapted object. This follows the "typeclass/instance" model of Haskell, and the concept of "pure" (i.e., transitively composable) adapters. However, there are occasionally cases where, to provide a complete implementation of some interface, some sort of additional state is required. One possibility of course, would be to attach monkeypatched "private" attributes to the adaptee. But this is subject to name collisions, and complicates the process of initialization. It also doesn't work on objects that don't have a ``__dict__`` attribute. So the ``Aspect`` class is provided to make it easy to attach extra information to objects that either: 1. have a ``__dict__`` attribute (so aspect instances can be stored in it, keyed by aspect class), 2. support weak referencing (so aspect instances can be managed using a global but thread-safe weak-reference dictionary), or 3. implement or can be adapt to the ``overloading.IAspectOwner`` interface (technically, #1 or #2 imply this) Subclassing ``Aspect`` creates an adapter class whose state is tied to the life of the adapted object. For example, suppose you would like to count all the times a certain method is called on instances of ``Target`` (a classic AOP example). You might do something like:: from overloading import Aspect class Count(Aspect): count = 0 @after(Target.some_method) def count_after_call(self, *args, **kw): Count(self).count += 1 The above code will keep track of the number of times that ``Target.some_method()`` is successfully called (i.e., it will not count errors). Other code can then access the count using ``Count(someTarget).count``. ``Aspect`` instances can of course have ``__init__`` methods, to initialize any data structures. They can use either ``__slots__`` or dictionary-based attributes for storage. While this facility is rather primitive compared to a full-featured AOP tool like AspectJ, persons who wish to build pointcut libraries or other AspectJ-like features can certainly use ``Aspect`` objects and method-combination decorators as a base for more expressive AOP tools. XXX spec out full aspect API, including keys, N-to-1 aspects, manual attach/detach/delete of aspect instances, and the ``IAspectOwner`` interface. Extension API ============= TODO: explain how all of these work implies(o1, o2) declare_implementation(iface, class) predicate_signatures(ob) parse_rule(ruleset, body, predicate, actiontype, localdict, globaldict) combine_actions(a1, a2) rules_for(f) Rule objects ActionDef objects RuleSet objects Method objects MethodList objects IAspectOwner Implementation Notes ==================== Most of the functionality described in this PEP is already implemented in the in-development version of the PEAK-Rules framework. In particular, the basic overloading and method combination framework (minus the ``@overload`` decorator) already exists there. The implementation of all of these features in ``peak.rules.core`` is 656 lines of Python at this writing. ``peak.rules.core`` currently relies on the DecoratorTools and BytecodeAssembler modules, but both of these dependencies can be replaced, as DecoratorTools is used mainly for Python 2.3 compatibility and to implement structure types (which can be done with named tuples in later versions of Python). The use of BytecodeAssembler can be replaced using an "exec" or "compile" workaround, given a reasonable effort. (It would be easier to do this if the ``func_closure`` attribute of function objects was writable.) The ``Interface`` class has been previously prototyped, but is not included in PEAK-Rules at the present time. The "implicit class rule" has previously been implemented in the RuleDispatch library. However, it relies on the ``__metaclass__`` hook that is currently eliminated in PEP 3115. I don't currently know how to make ``@overload`` play nicely with ``classmethod`` and ``staticmethod`` in class bodies. It's not really clear if it needs to, however. Copyright ========= This document has been placed in the public domain. From guido at python.org Tue May 1 00:54:04 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Apr 2007 15:54:04 -0700 Subject: [Python-3000] super(), class decorators, and PEP 3115 In-Reply-To: <5.1.1.6.0.20070430152320.02d31868@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430142844.03c96240@sparrow.telecommunity.com> <5.1.1.6.0.20070430152320.02d31868@sparrow.telecommunity.com> Message-ID: On 4/30/07, Phillip J. Eby wrote: > At 12:17 PM 4/30/2007 -0700, Guido van Rossum wrote: > >Assuming class decorators are added, can't you do all of this using a > >custom metaclass? > > The only thing I need for the GF PEP is a way for a method decorator to get > a callback after the class is created, so that overloading will work > correctly in cases where overloaded methods are defined in a subclass. I still don't understand why you can't tell the users "for this to work, you must use my special magic super-duper metaclass defined *here*". Surely a sufficiently advanced metaclass can pull of this kind of magic in its __init__ method? If not a metaclass, then a super-duper decorator. Or what am I missing? > In essence, when you define an overloaded method inside a class body, you > would like to be able to treat it as if it were defined with > "self:__class__", where __class__ is the enclosing class. In practice, > this means that the actual overloading has to wait until the class > definition is finished. > > In Python 2.x, RuleDispatch implements this by temporary tinkering with > __metaclass__, but if I understand correctly this would not be possible > with PEP 3115. I didn't make this connection until I was fleshing out my > PEP's explanation of how precedence works when you are overloading instance > methods (as opposed to standalone functions). Correct. As the word tinkering implies, you'll have to come up with a different approach. > If PEP 3115 were changed to restore support for __metaclass__, I could > continue to use that approach. Otherwise, some other sort of hook is required. I'm -1 on augmenting PEP 3115 for this purpose. > The class decorator thing isn't an issue for the GF PEP as such; it doesn't > use them directly, only via the __metaclass__ hack. I just brought it up > because I was looking for the class decorator PEP when I realized that the > old way of doing them wouldn't be possible any more. As long as someone's working on it (which I hear someone is), the class decorator PEP is secure; the actualy discussion was closed successfully weeks ago. But I don't understand how a __metaclass__ hack can use a class decorator. > >I'm not sure that your proposal for implementing an improved super has > >anything over the currently most-favored proposal by Timothy Delaney. > > It's merely another use for the hook, that would save on having another > special-purpose mechanism strictly for super(); I figured that having other > uses for it (besides mine) would be a plus. I'd leave that up to the folks currently discussing super. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Tue May 1 00:55:36 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 30 Apr 2007 18:55:36 -0400 Subject: [Python-3000] super() PEP In-Reply-To: References: <5.1.1.6.0.20070430142844.03c96240@sparrow.telecommunity.com> <011b01c78b6a$72098810$0201a8c0@ryoko> <43aa6ff70704301403q5bf557c2wf43148f7a339353d@mail.gmail.com> <014901c78b6e$d2d66d80$0201a8c0@ryoko> Message-ID: On 4/30/07, Lino Mastrodomenico wrote: > One more thing: what do people think of modifying super so that when > it doesn't find a method instead of raising AttributeError it returns > something like "lambda *args, **kwargs: None"? To me, the most important change is correctness -- super(__this_class__, self) over super(Name, self). Anything else is at least debatable. But of all the shortcuts mentioned, this particular shortcut is easily the most valuable to me. At one point, I had even considered giving the super object a special method to upcall in this manner. For What Its Worth, in my own code, when I don't know whether or not the next method exists, I will always be upcalling to the method of the same name, and passing all my arguments. Even changing the value of one argument would be strange enough to count as a special case worth spelling out. Alas, Guido's recent opinion was "Don't do that". He suggested, at a minimum, inheriting from an ABC that provided the Nothing method. > Optionally this can be a constant (e.g. default_method) defined > somewhere so, if necessary, it's still possible to detect if the value > of super.meth is a real method or the "fake" default_method. http://www.python.org/sf/1673203 is a patch for adding an identity method; I suspect a Nothing in the builtins would also make sense. -jJ From guido at python.org Tue May 1 00:55:55 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Apr 2007 15:55:55 -0700 Subject: [Python-3000] super(), class decorators, and PEP 3115 In-Reply-To: <43aa6ff70704301403q5bf557c2wf43148f7a339353d@mail.gmail.com> References: <5.1.1.6.0.20070430142844.03c96240@sparrow.telecommunity.com> <011b01c78b6a$72098810$0201a8c0@ryoko> <43aa6ff70704301403q5bf557c2wf43148f7a339353d@mail.gmail.com> Message-ID: On 4/30/07, Collin Winter wrote: > On 4/30/07, Tim Delaney wrote: > > Would you prefer me to work with Calvin to get his existing PEP to match my > > proposal, or would you prefer a competing PEP? > > Please work together with Calvin. One PEP is enough. And don't worry too much about the exact deadline; at this point super is not a new proposal, we're just hashing out details (however violently :-). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Tue May 1 00:56:45 2007 From: barry at python.org (Barry Warsaw) Date: Mon, 30 Apr 2007 18:56:45 -0400 Subject: [Python-3000] octal literals PEP In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 30, 2007, at 6:47 PM, Guido van Rossum wrote: > The PEP editors have admitted to being behind on the job. AFAIK PEPs > sent to the PEP editors before the deadline are in, regardless of when > the PEP goes online. > > To save the PEP editors the effort, if you send it to me I will assign > it a PEP number and submit it. (Ditto for other PEPs in the same > situation.) Thanks Guido. peps at python dot org is now a mailing list and we will soon have three additional editors to help out. Please also see my call for junior editors, just posted. Cheers, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjZ0LnEjvBPtnXfVAQLEkwP9Gl4SJtg+H1w91djZ5Bo1Ef+MMTfpwqwM Rpr6nxgKRCg1Xuzo7Y2aHzrXOvO05r/Lla5djUfHnH7SKsoeP71Kw9+jfGyM4DcL l3dQ2YCc1vD4fEWB5jp1VwjFGxXaes6fVBF7ERN1G2yTxbmWzk4ugNijcUYkbGiM rj4koq2YNds= =pvzf -----END PGP SIGNATURE----- From brett at python.org Tue May 1 00:57:10 2007 From: brett at python.org (Brett Cannon) Date: Mon, 30 Apr 2007 15:57:10 -0700 Subject: [Python-3000] super() PEP In-Reply-To: References: <5.1.1.6.0.20070430142844.03c96240@sparrow.telecommunity.com> <011b01c78b6a$72098810$0201a8c0@ryoko> <43aa6ff70704301403q5bf557c2wf43148f7a339353d@mail.gmail.com> <014901c78b6e$d2d66d80$0201a8c0@ryoko> Message-ID: On 4/30/07, Lino Mastrodomenico wrote: > 2007/4/30, Tim Delaney : > > Fine with me. Calvin - want to send me your latest draft, and I'll do some > > modifications? I think we've got to the point now where we can take this > > off-list. > > One more thing: what do people think of modifying super so that when > it doesn't find a method instead of raising AttributeError it returns > something like "lambda *args, **kwargs: None"? > Yuck. That just smacks of JavaScript and its lax error detection. There is a reason they are adding a strict pragma in JS 2.0. -?. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070430/f6d91370/attachment.html From barry at python.org Tue May 1 01:06:11 2007 From: barry at python.org (Barry Warsaw) Date: Mon, 30 Apr 2007 19:06:11 -0400 Subject: [Python-3000] PEP 3119 - Introducing Abstract Base Classes In-Reply-To: References: <04A4F15C-38C3-4727-875D-82803F4FB974@python.org> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 27, 2007, at 1:10 PM, Jim Jewett wrote: > On 4/27/07, Barry Warsaw wrote: > >> - - Attributes. Interfaces allow you to make assertions about >> attributes, not just methods, while ABCs necessarily cover only >> methods. > > Why can't they have data attributes as well? They can /have/ data attributes, but that's not really the point. The point (IMHO) is that such attributes can be documented, inspected, and reasoned about. You could annotate interface attributes with type information in order to automatically generate database tables or web forms, etc. Normal Python attributes can't do that, although if they were properties, they could. >> - - With interfaces, you can make assertions about individual objects >> which may be different than what their classes assert. Interface >> proponents seem to care a lot about this and it seems there are valid >> uses cases for it. > > Isn't this something that could be handled by overriding isinstance? It could. >> Another example of separating inheritance and interface comes up when >> you want to derive a subclass to share implementation details, but >> you want to subtly change the semantics, which would invalidate an >> ABC claim by the base class. Something like a GrowOnlyDictionary >> that derived from dict for implementation purposes, but didn't want >> to implement __delitem__ as required by the MutableMapping ABC. > > OK, that makes the isubclass override trickier, so there should be an > example, but I think it can still be done. > >> Finally, I'm concerned with the "weight" of adding ABCs to all the >> built-in types. > > What if the builtin types did not initially derive from any ABC, but > were added (through an issubclass override) when the abc module was > imported? That would allow for some unfortunately global side-effects. Say I happen to import your library that imports abc. Now all the built-in types in my entire application get globally changed. I'm also not sure how you'd implement that. Cheers, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjZ2ZHEjvBPtnXfVAQJMYgP+PiEvTRe+AeQHJSjYfx3kxE3oV+n9kfbL xns+fK6Chub+frAzcHz+an7GXikTxbdYHysunWqhpB0TSOZfF7SzKNgD3pHTKmN/ zyMVTykr5zynmLPi8bygZfTNlm340Qrc+ymE3qjCsbRP9XZtFC5CJYmlIM2kU0MI HMV5KtXjbgc= =77iN -----END PGP SIGNATURE----- From barry at python.org Tue May 1 01:08:08 2007 From: barry at python.org (Barry Warsaw) Date: Mon, 30 Apr 2007 19:08:08 -0400 Subject: [Python-3000] PEP 3119 - Introducing Abstract Base Classes In-Reply-To: <217F4CF1-3CC1-48CE-A635-877C22562C78@PageDNA.com> References: <04A4F15C-38C3-4727-875D-82803F4FB974@python.org> <07Apr27.104005pdt."57996"@synergy1.parc.xerox.com> <217F4CF1-3CC1-48CE-A635-877C22562C78@PageDNA.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 27, 2007, at 2:17 PM, Tony Lownds wrote: > +0 on abstract attributes. Methods seem to dominate most APIs that > make > use of interfaces, but there are always a few exceptions. One of the reasons to be able to specify attributes in an ABC or interface is so that you can use something more Pythonic than getters and setters. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjZ22HEjvBPtnXfVAQLbwAQAlicMWta8mZSQEgiRcc+VvQG1kPVYRy/t 3Dlp5cEHog6VMdTH7iEN+TSAszsXatjbeo9nl/fT/fI3RYrre5+hiclVoyLCnfUF jJda589xj9EzKjJfYPl1dbCjxp5S/nK2RmtOMN3HxLMcuKQ0I3ZSbAlR+BKRO55T qRSmKp6Ebb8= =y7ge -----END PGP SIGNATURE----- From guido at python.org Tue May 1 01:10:45 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Apr 2007 16:10:45 -0700 Subject: [Python-3000] super() PEP In-Reply-To: References: <5.1.1.6.0.20070430142844.03c96240@sparrow.telecommunity.com> <011b01c78b6a$72098810$0201a8c0@ryoko> <43aa6ff70704301403q5bf557c2wf43148f7a339353d@mail.gmail.com> <014901c78b6e$d2d66d80$0201a8c0@ryoko> Message-ID: On 4/30/07, Lino Mastrodomenico wrote: > 2007/4/30, Tim Delaney : > > Fine with me. Calvin - want to send me your latest draft, and I'll do some > > modifications? I think we've got to the point now where we can take this > > off-list. > > One more thing: what do people think of modifying super so that when > it doesn't find a method instead of raising AttributeError it returns > something like "lambda *args, **kwargs: None"? > > Optionally this can be a constant (e.g. default_method) defined > somewhere so, if necessary, it's still possible to detect if the value > of super.meth is a real method or the "fake" default_method. > > I think this can be useful when a method *doesn't know* if it's the > last in the MRO because it may depend on the inheritance hierarchy of > its subclasses: you can always simply call super.meth(...) and if the > current method is the last this will be a NOP. Most definitely not. If you don't even know whether you're defining or overriding a method you shouldn't be using super in the first place, because you're *obviously* not engaged in cooperative MI. And don't get me started abut __init__. Constructors don't do cooperative MI, period. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 1 01:13:44 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Apr 2007 16:13:44 -0700 Subject: [Python-3000] PEP 3119 - Introducing Abstract Base Classes In-Reply-To: References: <04A4F15C-38C3-4727-875D-82803F4FB974@python.org> <217F4CF1-3CC1-48CE-A635-877C22562C78@PageDNA.com> Message-ID: On 4/30/07, Barry Warsaw wrote: > On Apr 27, 2007, at 2:17 PM, Tony Lownds wrote: > > +0 on abstract attributes. Methods seem to dominate most APIs that > > make use of interfaces, but there are always a few exceptions. > > One of the reasons to be able to specify attributes in an ABC or > interface is so that you can use something more Pythonic than getters > and setters. Even if support for abstract attributes is not provided by default in py3k, it shouldn't be hard to add as a pure-python 3rd party add-on, using a custom metaclass or a class decorator. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Tue May 1 01:16:09 2007 From: barry at python.org (Barry Warsaw) Date: Mon, 30 Apr 2007 19:16:09 -0400 Subject: [Python-3000] Traits/roles instead of ABCs In-Reply-To: <740c3aec0704301501u7df7b5a6uaea854d4716eb87e@mail.gmail.com> References: <43aa6ff70704291840s3384824et44ebfd360c15eda@mail.gmail.com> <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> <1d36917a0704300816ma3bf9c2o4dd674cfcefa9172@mail.gmail.com> <-3456230403858254882@unknownmsgid> <740c3aec0704301501u7df7b5a6uaea854d4716eb87e@mail.gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 30, 2007, at 6:01 PM, BJ?rn Lindqvist wrote: > On 4/30/07, Bill Janssen wrote: >>> On 4/30/07, Raymond Hettinger wrote: >>>> I'm concerned that the current ABC proposal will quickly evolve >>>> from optional >>>> to required and create somewhat somewhat java-esque landscape where >>>> inheritance and full-specification are the order of the day. >>> >>> +1 for preferring simple solutions to complex ones >> >> Me, too. But which is the simple solution? I tend to think ABCs >> are. > > Neither or. They are both an order of a magnitude more complex than > the problem they are designed to solve. Raymond Hettingers small list > of three example problems earlier in the thread, is the most concrete > description of what the problem really is all about. And I would > honestly rather sort them under "minor annoyances" than "really > critical stuff, needs to be fixed asap." Interfaces and ABCs are really all about Programming in the Really Large. Most Python programs don't need this stuff, and in fact, having to deal with them in any way would IMO reduce the elegance of Python for small to medium (and even most large) applications. I think the experience of Zope, Twisted, and PEAK have shown though that /something/ is necessary to manage the complexity when applications become frameworks. To me, interfaces and/or generic functions strike the right balance. Such tools are completely invisible for Python programmers who don't care about them (the vast majority). They're also essential for a very small subclass of very important Python applications. If ABCs can walk that same tightrope of utility and invisibility, then maybe they'll successfully fill that niche. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjZ4u3EjvBPtnXfVAQJGmAP+J2JMrQ985nx+ivFeq0Er9MWTo/zVtVyh MH5X/W7NEYX+NfMEqbdM/pdi2JsvzVEX2bjaOpp28mMKw101DZ05wv5QMimjvzI1 WPR56AU7/an3yQPNQV3moBfAYtf5lIhRGG/uEjWYq9mG6ORQy3VmlxTsQygvFfwd 9t/lfCF7mKg= =gAcm -----END PGP SIGNATURE----- From barry at python.org Tue May 1 01:17:14 2007 From: barry at python.org (Barry Warsaw) Date: Mon, 30 Apr 2007 19:17:14 -0400 Subject: [Python-3000] Traits/roles instead of ABCs In-Reply-To: References: <43aa6ff70704291840s3384824et44ebfd360c15eda@mail.gmail.com> <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> <1d36917a0704300816ma3bf9c2o4dd674cfcefa9172@mail.gmail.com> <-3456230403858254882@unknownmsgid> <740c3aec0704301501u7df7b5a6uaea854d4716eb87e@mail.gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 30, 2007, at 6:31 PM, Brett Cannon wrote: > I think it would be a little difficult in this situation as since a > similar mechanism does not currently exist in the stdlib and so most > code is not written so that ABCs or roles are needed. This is for a reason... they're not! :) - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjZ4+3EjvBPtnXfVAQL1JwP/TsBg8bPvyuTExNOgFQJIcjQ5yqaaw58Y co6J0DDrNuZYxBzPtFJVmN4GfPxieqNrJOFGzP48O5zH1rpXFKfvGOKsh1RQCmjQ +IQzz0bj4hz8st7hKTUZitblyDRxiiOAl3pwnLsKTimBrZ+HwPF5qC2g/INg4A4O 8qkHtCSmVG0= =FGDt -----END PGP SIGNATURE----- From brett at python.org Tue May 1 01:19:06 2007 From: brett at python.org (Brett Cannon) Date: Mon, 30 Apr 2007 16:19:06 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: On 4/30/07, Phillip J. Eby wrote: > > This is just the first draft (also checked into SVN), and doesn't include > the details of how the extension API works (so that third-party interfaces > and generic functions can interoperate using the same decorators, > annotations, etc.). > > Comments and questions appreciated, as it'll help drive better > explanations > of both the design and rationales. I'm usually not that good at guessing > what other people will want to know (or are likely to misunderstand) until > I get actual questions. > > > PEP: 3124 > Title: Overloading, Generic Functions, Interfaces, and Adaptation > Version: $Revision: 55029 $ > Last-Modified: $Date: 2007-04-30 18:48:06 -0400 (Mon, 30 Apr 2007) $ > Author: Phillip J. Eby > Discussions-To: Python 3000 List > Status: Draft > Type: Standards Track > Requires: 3107, 3115, 3119 > Replaces: 245, 246 > Content-Type: text/x-rst > Created: 28-Apr-2007 > Post-History: 30-Apr-2007 [SNIP] > The ``@overload`` decorator allows you to define alternate > implementations of a function, specialized by argument type(s). A > function with the same name must already exist in the local namespace. > The existing function is modified in-place by the decorator to add > the new implementation, and the modified function is returned by the > decorator. Thus, the following code:: > > from overloading import overload > from collections import Iterable > > def flatten(ob): > """Flatten an object to its component iterables""" > yield ob > > @overload > def flatten(ob: Iterable): > for o in ob: > for ob in flatten(o): > yield ob > > @overload > def flatten(ob: basestring): > yield ob Doubt there is a ton of use for it, but any way to use this for pattern matching ala Standard ML or Haskell? Would be kind of neat to be able to do recursive function definitions and choose which specific function implementation based on the length of an argument. But I don't see how that would be possible with this directly. I guess if a SingularSequence type was defined that overloaded __isinstance__ properly maybe? I have not followed the __isinstance__ discussion closely so I am not sure. [SNIP] > Proceeding to the "Next" Method > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > If the first parameter of an overloaded function is named > ``__proceed__``, it will be passed a callable representing the next > most-specific method. For example, this code:: > > def foo(bar:object, baz:object): > print "got objects!" > > @overload > def foo(__proceed__, bar:int, baz:int): > print "got integers!" > return __proceed__(bar, baz) > > Will print "got integers!" followed by "got objects!". > > If there is no next most-specific method, ``__proceed__`` will be > bound to a ``NoApplicableMethods`` instance. When called, a new > ``NoApplicableMethods`` instance will be raised, with the arguments > passed to the first instance. > > Similarly, if the next most-specific methods have ambiguous precedence > with respect to each other, ``__proceed__`` will be bound to an > ``AmbiguousMethods`` instance, and if called, it will raise a new > instance. > > Thus, a method can either check if ``__proceed__`` is an error > instance, or simply invoke it. The ``NoApplicableMethods`` and > ``AmbiguousMethods`` error classes have a common ``DispatchError`` > base class, so ``isinstance(__proceed__, overloading.DispatchError)`` > is sufficient to identify whether ``__proceed__`` can be safely > called. > > (Implementation note: using a magic argument name like ``__proceed__`` > could potentially be replaced by a magic function that would be called > to obtain the next method. A magic function, however, would degrade > performance and might be more difficult to implement on non-CPython > platforms. Method chaining via magic argument names, however, can be > efficiently implemented on any Python platform that supports creating > bound methods from functions -- one simply recursively binds each > function to be chained, using the following function or error as the > ``im_self`` of the bound method.) Could you change __proceed__ to be a keyword-only argument? That way it would match the precedence of class definitions and the 'metaclass' keyword introduced by PEP 3115. I personally would prefer to control what the default is if __proceed__ is not passed in at the parameter level then have to do a check if it's NoApplicableMethod. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070430/9ba16dd0/attachment.htm From jimjjewett at gmail.com Tue May 1 01:29:30 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 30 Apr 2007 19:29:30 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: On 4/30/07, Phillip J. Eby wrote: > It is currently an open issue to determine the best way to implement > this rule in Python 3.0. Under Python 2.x, a class' metaclass was > not chosen until the end of the class body, which means that > decorators could insert a custom metaclass to do processing of this > sort. (This is how RuleDispatch, for example, implements the implicit > class rule.) > PEP 3115, however, requires that a class' metaclass be determined > *before* the class body has executed, making it impossible to use this > technique for class decoration any more. It doesn't say what that metaclass has to do, though. Is there any reason the metaclass couldn't delegate differently depending on the value of __my_magic_attribute__ ? -jJ From jimjjewett at gmail.com Tue May 1 01:37:25 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 30 Apr 2007 19:37:25 -0400 Subject: [Python-3000] PEP 3119 - Introducing Abstract Base Classes In-Reply-To: References: <04A4F15C-38C3-4727-875D-82803F4FB974@python.org> Message-ID: On 4/30/07, Barry Warsaw wrote: > On Apr 27, 2007, at 1:10 PM, Jim Jewett wrote: > > On 4/27/07, Barry Warsaw wrote: > >> Finally, I'm concerned with the "weight" of adding ABCs to all the > >> built-in types. > > What if the builtin types did not initially derive from any ABC, but > > were added (through an issubclass override) when the abc module was > > imported? > That would allow for some unfortunately global side-effects. Say I > happen to import your library that imports abc. Now all the built-in > types in my entire application get globally changed. I'm also not > sure how you'd implement that. I don't see how these side-effects could ever be detected, except to the extent that issubclass overrides are inherently dangerous. I see it something like # module abc.py class Integer()... ... Integer.register(int) Integer.register(long) After that, int (and long) are changed only by the addition of an extra reference count; their __bases__ and __mro__ are utterly unchanged. But isinstance(int, Integer) is now True. Yes, this is global -- but the only way to detect it is to have a reference to Integer, which implies having already relied on the ABC framework. -jJ From guido at python.org Tue May 1 01:43:16 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Apr 2007 16:43:16 -0700 Subject: [Python-3000] PEP 3119 - Introducing Abstract Base Classes In-Reply-To: References: <04A4F15C-38C3-4727-875D-82803F4FB974@python.org> Message-ID: > > > On 4/27/07, Barry Warsaw wrote: > > >> Finally, I'm concerned with the "weight" of adding ABCs to all the > > >> built-in types. > > On Apr 27, 2007, at 1:10 PM, Jim Jewett wrote: > > > What if the builtin types did not initially derive from any ABC, but > > > were added (through an issubclass override) when the abc module was > > > imported? > On 4/30/07, Barry Warsaw wrote: > > That would allow for some unfortunately global side-effects. Say I > > happen to import your library that imports abc. Now all the built-in > > types in my entire application get globally changed. I'm also not > > sure how you'd implement that. On 4/30/07, Jim Jewett wrote: > I don't see how these side-effects could ever be detected, except to > the extent that issubclass overrides are inherently dangerous. > > I see it something like > > # module abc.py > > class Integer()... > ... > > Integer.register(int) > Integer.register(long) > > After that, int (and long) are changed only by the addition of an > extra reference count; their __bases__ and __mro__ are utterly > unchanged. But > > isinstance(int, Integer) > > is now True. Yes, this is global -- but the only way to detect it is > to have a reference to Integer, which implies having already relied on > the ABC framework. Right. int (long doesn't exist in py3k!) doesn't change -- the only thing that "changes" is that the question subclass(int, Integer) is answered positively, but since you can't ask that question without first importing Integer (from abc), there is no way that you can detect this as a change. Note that you won't find Integer if you traverse int.__mro__ or int.__bases__. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Tue May 1 01:42:03 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 01 May 2007 11:42:03 +1200 Subject: [Python-3000] [Python-Dev] Pre-pre PEP for 'super' keyword In-Reply-To: <76fd5acf0704291801o47733e29u634ffa317d32a0a7@mail.gmail.com> References: <76fd5acf0704240711p22f8060k25d787c0e85b6fb8@mail.gmail.com> <002401c78778$75fb7eb0$0201a8c0@ryoko> <00b601c78a9f$38ec9390$0201a8c0@ryoko> <76fd5acf0704291801o47733e29u634ffa317d32a0a7@mail.gmail.com> Message-ID: <46367ECB.3060504@canterbury.ac.nz> Calvin Spealman wrote: > I also checked and PyPy does implement a sys._getframe() and a > IronPython currently doesn't, but seems to plan on it (there is a > placeholder, at present). I am not sure if notes on this belongs in > the PEP or not. If this is to have a chance, you really need to come up with an implementation that doesn't rely on sys._getframe, even in CPython. It's a hack that has no place in something intended for routine use. -- Greg From pje at telecommunity.com Tue May 1 01:50:09 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 30 Apr 2007 19:50:09 -0400 Subject: [Python-3000] super(), class decorators, and PEP 3115 In-Reply-To: References: <5.1.1.6.0.20070430152320.02d31868@sparrow.telecommunity.com> <5.1.1.6.0.20070430142844.03c96240@sparrow.telecommunity.com> <5.1.1.6.0.20070430152320.02d31868@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070430192208.02ddaee8@sparrow.telecommunity.com> At 03:54 PM 4/30/2007 -0700, Guido van Rossum wrote: >On 4/30/07, Phillip J. Eby wrote: >>At 12:17 PM 4/30/2007 -0700, Guido van Rossum wrote: >> >Assuming class decorators are added, can't you do all of this using a >> >custom metaclass? >> >>The only thing I need for the GF PEP is a way for a method decorator to get >>a callback after the class is created, so that overloading will work >>correctly in cases where overloaded methods are defined in a subclass. > >I still don't understand why you can't tell the users "for this to >work, you must use my special magic super-duper metaclass defined >*here*". Surely a sufficiently advanced metaclass can pull of this >kind of magic in its __init__ method? If not a metaclass, then a >super-duper decorator. Or what am I missing? Metaclasses don't mix well. If the user already has a metaclass, they'll have to create a custom subclass, since Python doesn't do auto-combination of metaclasses (per the "Putting Metaclasses to Work" book). This makes things messy, especially if the user doesn't *know* they're using a metaclass already (e.g., they got one by inheritance). For the specific use case I'm concerned about, it's like "super()" in that a function defined inside a class body needs to know what class it's in. (Actually, it's the decorator that needs to know, and it ideally needs to know as soon as the class is defined, rather than waiting until a call occurs later.) As with "super()", this really has nothing to do with the class. It would make about as much sense as having a metaclass or class decorator called ``SuperUser``; i.e., it would work, but it's just overhead for the user. So, if there ends up being a general way to access that "containing class" from a function decorator, or at least to get a callback once the class is defined, that's all I need for this use case that can't reasonably be handled by a normal metaclass. Note, too, that the such a hook would also allow you to make classes into ABCs through the presence of an @abstractmethod, without also having to inherit from Abstract or set an explicit metaclass. (Unless of course you prefer to have the abstractness called out up-front... but then that explicitness goes out the window as soon as you e.g. sublcass Sequence from Iterable.) >But I don't understand how a __metaclass__ hack can use a class decorator. The __metaclass__ hack is used in Python 2.x to dynamically *add* class decorators while the class suite is being executed, that will be called *after* the class is created. A function decorator (think of your @abstractmethod, for example) would monkeypatch the metaclass so it gets a crack at class after it's created, without the user having to explicitly set up the metaclass (or merge any inherited metaclasses). From pje at telecommunity.com Tue May 1 01:52:48 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 30 Apr 2007 19:52:48 -0400 Subject: [Python-3000] super() PEP In-Reply-To: References: <5.1.1.6.0.20070430142844.03c96240@sparrow.telecommunity.com> <011b01c78b6a$72098810$0201a8c0@ryoko> <43aa6ff70704301403q5bf557c2wf43148f7a339353d@mail.gmail.com> <014901c78b6e$d2d66d80$0201a8c0@ryoko> Message-ID: <5.1.1.6.0.20070430195130.04b52328@sparrow.telecommunity.com> At 04:10 PM 4/30/2007 -0700, Guido van Rossum wrote: >And don't get me started abut __init__. Constructors don't do >cooperative MI, period. Actually, metaclass __init__'s do. In fact, they *have to*. Right now, we get away with it because the type(name, bases, dict) signature is fixed. Once we add keyword args, though, things will get hairier. From barry at python.org Tue May 1 01:53:58 2007 From: barry at python.org (Barry Warsaw) Date: Mon, 30 Apr 2007 19:53:58 -0400 Subject: [Python-3000] PEP 3119 - Introducing Abstract Base Classes In-Reply-To: References: <04A4F15C-38C3-4727-875D-82803F4FB974@python.org> Message-ID: <0840AFD6-909C-43F9-819A-ACF747B313EE@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Apr 30, 2007, at 7:43 PM, Guido van Rossum wrote: > Right. int (long doesn't exist in py3k!) doesn't change -- the only > thing that "changes" is that the question subclass(int, Integer) is > answered positively, but since you can't ask that question without > first importing Integer (from abc), there is no way that you can > detect this as a change. Note that you won't find Integer if you > traverse int.__mro__ or int.__bases__. Cool. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjaBlnEjvBPtnXfVAQLxgAQApwDaBmGw5UHemDloVr7NIxRhgaAXpg9p x2JyoCi82xnHNw1kZl120thlc8PuWO4lEZ9YLh12CjZnyY22Q1W68WqJi3n6D6cq UGXha0ANi7c82FpqZldtztAb3zPKSYr7g1XB2uAUz7lVcHdGYfr5HAAPKsDMWJXs dTs1sBuoZkE= =Q6Kq -----END PGP SIGNATURE----- From pje at telecommunity.com Tue May 1 02:02:47 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 30 Apr 2007 20:02:47 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070430195413.047ad8d8@sparrow.telecommunity.com> At 04:19 PM 4/30/2007 -0700, Brett Cannon wrote: >Doubt there is a ton of use for it, but any way to use this for pattern >matching ala Standard ML or Haskell? Yes. You have to provide a different dispatching engine though, as will be described in the currently non-existent "extension API" section. :) Perhaps you saw the part of the PEP with the "Pred('python expression here')" example? > Would be kind of neat to be able to do recursive function definitions > and choose which specific function implementation based on the length of > an argument. But I don't see how that would be possible with this > directly. I guess if a SingularSequence type was defined that overloaded > __isinstance__ properly maybe? I have not followed the __isinstance__ > discussion closely so I am not sure. No, the base engine will only support __issubclass__ overrides and other class-based criteria, as it's strictly a type-tuple cache system (ala Guido's generic function prototype, previously discussed here and on his blog). However, engines will be pluggable based on predicate type(s). If you use a predicate that's not supported by the engine currently attached to a function, it will attempt to "upgrade" to a better engine. So, PEAK-Rules for example will register a predicate type for arbitrary Python expressions, and an engine factory for dispatching on them. >Could you change __proceed__ to be a keyword-only argument? That way it >would match the precedence of class definitions and the 'metaclass' >keyword introduced by PEP 3115. I personally would prefer to control what >the default is if __proceed__ is not passed in at the parameter level then >have to do a check if it's NoApplicableMethod. You would still have to check if its ``AmbiguousMethods``, though. My current GF libraries use bound methods for speed, which means that the special parameter has to be in the first position. ``partial`` and other ways of constructing a method chain for keyword arguments would be a lot slower, just due to the use of keyword arguments. But it's certainly an option to use ``partial`` instead of bound methods, just a slower one. (My existing GF libraries all target Python 2.3, where ``partial`` didn't exist yet, so it wasn't an option I considered.) From pje at telecommunity.com Tue May 1 02:03:58 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 30 Apr 2007 20:03:58 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070430200255.04b88e10@sparrow.telecommunity.com> At 07:29 PM 4/30/2007 -0400, Jim Jewett wrote: >On 4/30/07, Phillip J. Eby wrote: > >>It is currently an open issue to determine the best way to implement >>this rule in Python 3.0. Under Python 2.x, a class' metaclass was >>not chosen until the end of the class body, which means that >>decorators could insert a custom metaclass to do processing of this >>sort. (This is how RuleDispatch, for example, implements the implicit >>class rule.) > >>PEP 3115, however, requires that a class' metaclass be determined >>*before* the class body has executed, making it impossible to use this >>technique for class decoration any more. > >It doesn't say what that metaclass has to do, though. > >Is there any reason the metaclass couldn't delegate differently >depending on the value of __my_magic_attribute__ ? Sure -- that's what I suggested in the "super(), class decorators, and PEP 3115" thread, but Guido voted -1 on adding such a magic attribute to PEP 3115. (Actually, I think he -1'd *any* change to 3115 to support this feature.) From guido at python.org Tue May 1 02:19:27 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Apr 2007 17:19:27 -0700 Subject: [Python-3000] Breakthrough in thinking about ABCs (PEPs 3119 and 3141) Message-ID: After a couple of whiteboard discussions with Collin Winter and Jeffrey Jasskin I have a much better grip on where to go next with the ABC PEPs. (a) Roles Collin will continue to develop his Roles PEP. This may or may not end up providing a viable alternative to ABCs; in either case it will be refreshing to compare and contrast the two proposals. (b) Overloading isinstance and issublcass The idea of overloading isinstance and issubclass is running into some resistance. I still like it, but if there is overwhelming discomfort, we can change it so that instead of writing isinstance(x, C) or issubclass(D, C) (where C overloads these operations), you'd have to write something like C.hasinstance(x) or C.hassubclass(D), where hasinstance and hassubclass are defined by some ABC metaclass. I'd still like to have the spec for hasinstance and hassubclass in the core language, so that different 3rd party frameworks don't need to invent different ways of spelling this inquiry. Personally, I still think that the most uniform way of spelling this is overloading isinstance and issubclass; that has the highest likelihood of standardizing the spelling for such inquiries. I'd like to avoid disasters such as Java's String.length vs. Vector.length() vs. Collection.size(). One observation is that in most cases isinstance and issubclass are used with a specific, known class as their second argument, so that the likelihood of breaking code by this overloading is minimal: the calls can be trusted as much as you trust the second argument. (I found only 4 uses of isinstance(x, ) amongst the first 50 hits in Google Code Search.) However this turns out, it makes me ant to reduce the number of ABCs defined initially in the PEPs, as it will now be easy to define ABCs representing "less-powerful abstractions" and insert them into the right place in the ABC hierarchy by overloading either issubclass or the alternative hassubclass class method. (c) ABCs as classes vs. ABCs as metaclasses The original ABC PEPs naively use subclassing from ABCs only. This ran into trouble when someone observed that if classes C and D both inherit from TotallyOrdered, that doesn't mean that C() < D() is defined. (For a quick counterexample, consider that int and str are both total orders, but 42 < "a" raises TypeError.) Similar for Ring and other algebraic notions introduced in PEP 3141. The correct approach is for TotallyOrdered to be a metaclass (is this the typeclass thing in Haskell?). I expect that we'll leave them out of the ABC namespace for now and instead just spell out __lt__ and __le__ as operators defined by various classes. If you want TotallyOrdered, you can easily define it yourself, call TotallyOrdered.register(int) etc., and then isinstance(int, TotallyOrdered) (or TotallyOrdered.hasinstance(int)) will return True. OTOH, many of the classes proposed in PEP 3119 (e.g. Set, Sequence, Mapping) do make sense as base classes, and I don't expect to turn these into metaclasses. (d) Comparing containers I am retracting the idea of making all sequences comparable; instead, you can compare only list to list, tuple to tuple, str to str, etc. Ditto for concatenation. This means that __eq__ and __and__ are not part of the Sequence spec. OTOH for sets, I think it makes sense to require all set implementations to be inter-comparable: an efficient default implementation can easily be provided, and since sets are a relatively new addition, there is no prior art of multiple incompatible set implementations in the core; to the contrary, the two built-in set types (set and frozenset) are fully interoperable (unlike sequences, of which there are many, and none of these are interoperable). For mappings I'm on the fence; while it would be easy to formally define m1 == m2 <==> set(m1.items()) == set(m2.items()), and that would be relatively easy to compute using only traversal and __getitem__, I'm not so sure there is any use for this, and it does break with tradition (dict can't currently be compared to the various dbm-style classes). (e) Numeric tower Jeffrey will write up the detailed specs for the numeric tower. MonoidUnderPlus, Ring and other algebraic notions are gone. We will have abstract classes Integer <: Rational <: Real <: Complex <: Number (*); and concrete classes complex <: Complex, float <: Real, decimal.Decimal <: Real, int <: Integer. (No concrete implementations of Rational, but there are many 3rd prty ones to choose from.) We came up with a really clever way whereby the implementations of binary operations like __add__ and __radd__ in concrete classes should defer to their counterpart in the abstract base class instead of returning NotImplemented; the abstract base class can then (at least in most cases) do the right thing when mixed operations are attempted on two different concrete subclasses that don't know about each other (solving a dilemma about which I blabbered yesterday). This does mean that Integer...Number will be built-in. (*) D <: C means that D is a subclass of C. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From daniel at stutzbachenterprises.com Tue May 1 02:21:49 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 30 Apr 2007 19:21:49 -0500 Subject: [Python-3000] Two proposals for a new list-like type: one modest, one radical In-Reply-To: <00c601c786ff$95839700$f101a8c0@RaymondLaptop1> References: <00c601c786ff$95839700$f101a8c0@RaymondLaptop1> Message-ID: On 4/25/07, Raymond Hettinger wrote: > > There are only a few use-cases (that I can think of) where Python's > > list() regularly outperforms the BList. These are: > > > > 1. A large LIFO stack, where there are many .append() and .pop(-1) > > operations. These are O(1) for a Python list, but O(log n) for the > > BList(). > > This is a somewhat important use-case (we devote two methods to it). I've been thinking about this a bit more. For the LIFO use case, I can cache a pointer within the root node to the right-most leaf node, which will make a sequence of n append and pop operations take O(n) amortized time (same as a regular list). The latest version, 0.9.4, on PyPi fixes most of the issues raised by others: - C++ style comments have been converted to C comments. - Variable declarations are now always at the beginning of a block. - Use Py_ssize_t instead of int in all (I think) the appropriate places. - Cleaned up the debugging code to rely on fewer macros - Removed all (I think) gcc-isms -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From guido at python.org Tue May 1 02:38:25 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Apr 2007 17:38:25 -0700 Subject: [Python-3000] super(), class decorators, and PEP 3115 In-Reply-To: <5.1.1.6.0.20070430192208.02ddaee8@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430142844.03c96240@sparrow.telecommunity.com> <5.1.1.6.0.20070430152320.02d31868@sparrow.telecommunity.com> <5.1.1.6.0.20070430192208.02ddaee8@sparrow.telecommunity.com> Message-ID: On 4/30/07, Phillip J. Eby wrote: > At 03:54 PM 4/30/2007 -0700, Guido van Rossum wrote: > >On 4/30/07, Phillip J. Eby wrote: > >>At 12:17 PM 4/30/2007 -0700, Guido van Rossum wrote: > >> >Assuming class decorators are added, can't you do all of this using a > >> >custom metaclass? > >> > >>The only thing I need for the GF PEP is a way for a method decorator to get > >>a callback after the class is created, so that overloading will work > >>correctly in cases where overloaded methods are defined in a subclass. > > > >I still don't understand why you can't tell the users "for this to > >work, you must use my special magic super-duper metaclass defined > >*here*". Surely a sufficiently advanced metaclass can pull of this > >kind of magic in its __init__ method? If not a metaclass, then a > >super-duper decorator. Or what am I missing? > > Metaclasses don't mix well. If the user already has a metaclass, they'll > have to create a custom subclass, since Python doesn't do auto-combination > of metaclasses (per the "Putting Metaclasses to Work" book). This makes > things messy, especially if the user doesn't *know* they're using a > metaclass already (e.g., they got one by inheritance). > > For the specific use case I'm concerned about, it's like "super()" in that > a function defined inside a class body needs to know what class it's > in. (Actually, it's the decorator that needs to know, and it ideally needs > to know as soon as the class is defined, rather than waiting until a call > occurs later.) > > As with "super()", this really has nothing to do with the class. It would > make about as much sense as having a metaclass or class decorator called > ``SuperUser``; i.e., it would work, but it's just overhead for the user. > > So, if there ends up being a general way to access that "containing class" > from a function decorator, or at least to get a callback once the class is > defined, that's all I need for this use case that can't reasonably be > handled by a normal metaclass. > > Note, too, that the such a hook would also allow you to make classes into > ABCs through the presence of an @abstractmethod, without also having to > inherit from Abstract or set an explicit metaclass. (Unless of course you > prefer to have the abstractness called out up-front... but then that > explicitness goes out the window as soon as you e.g. sublcass Sequence from > Iterable.) > > > >But I don't understand how a __metaclass__ hack can use a class decorator. > > The __metaclass__ hack is used in Python 2.x to dynamically *add* class > decorators while the class suite is being executed, that will be called > *after* the class is created. A function decorator (think of your > @abstractmethod, for example) would monkeypatch the metaclass so it gets a > crack at class after it's created, without the user having to explicitly > set up the metaclass (or merge any inherited metaclasses). It sounds like you were accessing __metaclass__ via sys._getframe() from within the decorator, right? That sounds fragile and should not be the basis of anything proposed for inclusion into the standard library in a PEP. Perhaps the GF PEP could propose a standard hook that a class could define to be run after the class is constructed. The hook could be acquired by regular inheritance. I think it's entirely reasonable to require that, in order to use an advanced feature *that is not yet supported by the core language*, users need to enable the feature not just by importing a module and using a decorator but also by something they need to do once per class, like specifying a metaclass, a class decorator, or a magic base class. Of course, once the core language adds built-in support for such a feature, it becomes slightly less advanced, and it is reasonable to expect that the special functionality be provided by object or type or some other aspect of the standard class definition machinery (maybe even a default decorator that's always invoked). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Tue May 1 02:50:45 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 01 May 2007 12:50:45 +1200 Subject: [Python-3000] Addition to PEP 3101 In-Reply-To: References: <8f01efd00704300953t6154d7e1j7ef18cead1acb344@mail.gmail.com> Message-ID: <46368EE5.6050409@canterbury.ac.nz> Patrick Maupin wrote: > Method calls are deliberately disallowed by the PEP, so that the > implementation has some hope of being securable. If attribute access is allowed, arbitrary code can already be triggered, so I don't see how this makes a difference to security. -- Greg From greg.ewing at canterbury.ac.nz Tue May 1 02:59:17 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 01 May 2007 12:59:17 +1200 Subject: [Python-3000] Traits/roles instead of ABCs In-Reply-To: <07Apr30.141916pdt.57996@synergy1.parc.xerox.com> References: <43aa6ff70704291840s3384824et44ebfd360c15eda@mail.gmail.com> <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> <438708814690534630@unknownmsgid> <79990c6b0704301001ga0d2429sdaded9ac75fa15c5@mail.gmail.com> <07Apr30.141916pdt.57996@synergy1.parc.xerox.com> Message-ID: <463690E5.6060603@canterbury.ac.nz> Bill Janssen wrote: >>On 30/04/07, Bill Janssen wrote: >>After 15 years not being able to clearly state what "file-like" or >>"mapping-like" means to different people, perhaps we should accept >>that there is no clear-cut answer...? > > And that's a problem -- people are confused. Instead of throwing up > our hands, I think we should define what "file-like" means. That assumes there exists a single definition of file-like that suits most purposes, and the only reason for confusion is just that this definition hasn't thus far been elucidated. But I don't think there is any such definition, and the confusion arises because people lazily use the vague term "file-like" instead of spelling out what they really mean ("has a read() method", etc.) Hopefully the new I/O system will help by breaking the API into more digestible pieces. -- Greg From pje at telecommunity.com Tue May 1 03:08:04 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 30 Apr 2007 21:08:04 -0400 Subject: [Python-3000] Breakthrough in thinking about ABCs (PEPs 3119 and 3141) In-Reply-To: Message-ID: <5.1.1.6.0.20070430205955.04953100@sparrow.telecommunity.com> At 05:19 PM 4/30/2007 -0700, Guido van Rossum wrote: >Collin will continue to develop his Roles PEP. This may or may not end >up providing a viable alternative to ABCs; in either case it will be >refreshing to compare and contrast the two proposals. These should also be interesting to compare with the "interfaces" part of PEP 3124, although they need not compete. (The module proposed in 3124 should be able to use ABCs or Roles as easily as it does its own Interfaces.) >Personally, I still think that the most uniform way of spelling this >is overloading isinstance and issubclass; that has the highest >likelihood of standardizing the spelling for such inquiries. A big +1 here. This is no different than e.g. operator.mul() being able to do different things depending on the second argument. >(is this the typeclass thing in Haskell?). Yeah; you'd say that in the typeclass "TotallyOrdered a", that "<" is a 2-argument function taking two "a"'s and returning a boolean. But that's way more parameterized than we can do in Python any time soon. I don't even go that far for PEP 3124, although in principle you could use its extension API to do something like that. Not something I want to even try thinking about in detail right now, though. From pje at telecommunity.com Tue May 1 03:13:56 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 30 Apr 2007 21:13:56 -0400 Subject: [Python-3000] super(), class decorators, and PEP 3115 In-Reply-To: References: <5.1.1.6.0.20070430192208.02ddaee8@sparrow.telecommunity.com> <5.1.1.6.0.20070430142844.03c96240@sparrow.telecommunity.com> <5.1.1.6.0.20070430152320.02d31868@sparrow.telecommunity.com> <5.1.1.6.0.20070430192208.02ddaee8@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070430205712.04d6f408@sparrow.telecommunity.com> At 05:38 PM 4/30/2007 -0700, Guido van Rossum wrote: >Of course, once the core language adds built-in support for such a >feature, it becomes slightly less advanced, and it is reasonable to >expect that the special functionality be provided by object or type or >some other aspect of the standard class definition machinery (maybe >even a default decorator that's always invoked). Yep. That's precisely it. I'm suggesting that since GF's, enhanced super(), and even potentially @abstractmethod have a use for such a hook, that this would be an appropriate hook to provide in object or type or whatever. Or, have the MAKE_CLASS opcode just do something like: ... cls = mcls(name, bases, prepared_dict) for decorator in cls.__decorators__: cls = decorator(cls) ... Heck, I'd settle for: ... cls = mcls(name, bases, prepared_dict) for callback in cls.__decorators__: callback(cls) ... As this version would still handle all of my use cases; it just wouldn't be as useful for things like @abstractmethod that really do want to change the metaclass or bases rather than simply be notified of what the class is. From rrr at ronadam.com Tue May 1 03:37:12 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 30 Apr 2007 20:37:12 -0500 Subject: [Python-3000] Traits/roles instead of ABCs In-Reply-To: <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> References: <43aa6ff70704291840s3384824et44ebfd360c15eda@mail.gmail.com> <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> Message-ID: <463699C8.40105@ronadam.com> Raymond Hettinger wrote: > [Collin Winter] >> Put another way, a role is an assertion about a set of capabilities. > . . . >> If there's interest in this, I could probably whip up a PEP before the deadline. > > +100 I'm very interested in seeing a lighter weight alternative to abc.py that: > > 1) is dynamic > 2) doesn't require inheritance to work > 3) doesn't require mucking with isinstance or other existing mechansims > 4) makes a limited, useful set of assertions rather than broadly covering a whole API. > 5) that leaves the notion of duck-typing as the rule rather than the exception > 6) that doesn't freeze all of the key APIs in concrete > > I'm concerned that the current ABC proposal will quickly evolve from optional > to required and create somewhat somewhat java-esque landscape where > inheritance and full-specification are the order of the day. +100 on Raymonds list here. I am concerned that the effect of most of the proposals will be to encode data as code to a greater degree. I generally try to do the opposite. That is, I try to make my data and code be independent of each other so my data is complete, and my code is usable for other things. There are times I want to pipeline (or assembly line) data and mark it now for later dispatching at point further down stream. In that case being able to temporarily and transparently attach a bit of meta data to the object and have it ride along with the data until some later point would be useful. Then to have some nice general purpose dispatcher to initiate the work at that point. A particular use case that I'm finding occurs quite often is that of sorting. Not the sorting of putting things in order, but the sorting as in mail sorters or dividing large groups into smaller sub groups. And of course that is a form of dispatching. So far I haven't seen anything that directly addresses these use cases. > IMHO, the ABC approach is using a cannon to shoot a mosquito. My day-to-day > problems are much smaller are could be solved by a metadata attribute or a > role/trait solution: > > * knowing whether a __getitem__ method implements a mapping or a sequence > * knowing whether an object can have more that one iterator (i.e a file has one > but a list can have many) > * knowing whether a sequence, file, cursor, etc is writable or just readonly. > > > Raymond > From rrr at ronadam.com Tue May 1 03:37:12 2007 From: rrr at ronadam.com (Ron Adam) Date: Mon, 30 Apr 2007 20:37:12 -0500 Subject: [Python-3000] Traits/roles instead of ABCs In-Reply-To: <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> References: <43aa6ff70704291840s3384824et44ebfd360c15eda@mail.gmail.com> <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> Message-ID: <463699C8.40105@ronadam.com> Raymond Hettinger wrote: > [Collin Winter] >> Put another way, a role is an assertion about a set of capabilities. > . . . >> If there's interest in this, I could probably whip up a PEP before the deadline. > > +100 I'm very interested in seeing a lighter weight alternative to abc.py that: > > 1) is dynamic > 2) doesn't require inheritance to work > 3) doesn't require mucking with isinstance or other existing mechansims > 4) makes a limited, useful set of assertions rather than broadly covering a whole API. > 5) that leaves the notion of duck-typing as the rule rather than the exception > 6) that doesn't freeze all of the key APIs in concrete > > I'm concerned that the current ABC proposal will quickly evolve from optional > to required and create somewhat somewhat java-esque landscape where > inheritance and full-specification are the order of the day. +100 on Raymonds list here. I am concerned that the effect of most of the proposals will be to encode data as code to a greater degree. I generally try to do the opposite. That is, I try to make my data and code be independent of each other so my data is complete, and my code is usable for other things. There are times I want to pipeline (or assembly line) data and mark it now for later dispatching at point further down stream. In that case being able to temporarily and transparently attach a bit of meta data to the object and have it ride along with the data until some later point would be useful. Then to have some nice general purpose dispatcher to initiate the work at that point. A particular use case that I'm finding occurs quite often is that of sorting. Not the sorting of putting things in order, but the sorting as in mail sorters or dividing large groups into smaller sub groups. And of course that is a form of dispatching. So far I haven't seen anything that directly addresses these use cases. > IMHO, the ABC approach is using a cannon to shoot a mosquito. My day-to-day > problems are much smaller are could be solved by a metadata attribute or a > role/trait solution: > > * knowing whether a __getitem__ method implements a mapping or a sequence > * knowing whether an object can have more that one iterator (i.e a file has one > but a list can have many) > * knowing whether a sequence, file, cursor, etc is writable or just readonly. > > > Raymond > From guido at python.org Tue May 1 04:43:55 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Apr 2007 19:43:55 -0700 Subject: [Python-3000] super(), class decorators, and PEP 3115 In-Reply-To: <5.1.1.6.0.20070430205712.04d6f408@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430142844.03c96240@sparrow.telecommunity.com> <5.1.1.6.0.20070430152320.02d31868@sparrow.telecommunity.com> <5.1.1.6.0.20070430192208.02ddaee8@sparrow.telecommunity.com> <5.1.1.6.0.20070430205712.04d6f408@sparrow.telecommunity.com> Message-ID: On 4/30/07, Phillip J. Eby wrote: > At 05:38 PM 4/30/2007 -0700, Guido van Rossum wrote: > >Of course, once the core language adds built-in support for such a > >feature, it becomes slightly less advanced, and it is reasonable to > >expect that the special functionality be provided by object or type or > >some other aspect of the standard class definition machinery (maybe > >even a default decorator that's always invoked). > > Yep. That's precisely it. I'm suggesting that since GF's, enhanced > super(), and even potentially @abstractmethod have a use for such a hook, > that this would be an appropriate hook to provide in object or type or > whatever. Or, have the MAKE_CLASS opcode just do something like: > > ... > cls = mcls(name, bases, prepared_dict) > for decorator in cls.__decorators__: > cls = decorator(cls) > ... > > Heck, I'd settle for: > > ... > cls = mcls(name, bases, prepared_dict) > for callback in cls.__decorators__: > callback(cls) > ... > > As this version would still handle all of my use cases; it just wouldn't be > as useful for things like @abstractmethod that really do want to change the > metaclass or bases rather than simply be notified of what the class is. OK, put one of those in the PEP (but I still think it's a waste of time to mention super). Though I think you may have to investigate exactly what MAKE_CLASS does. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From talin at acm.org Tue May 1 05:06:06 2007 From: talin at acm.org (Talin) Date: Mon, 30 Apr 2007 20:06:06 -0700 Subject: [Python-3000] Addition to PEP 3101 In-Reply-To: <46368EE5.6050409@canterbury.ac.nz> References: <8f01efd00704300953t6154d7e1j7ef18cead1acb344@mail.gmail.com> <46368EE5.6050409@canterbury.ac.nz> Message-ID: <4636AE9E.2020905@acm.org> Greg Ewing wrote: > Patrick Maupin wrote: > >> Method calls are deliberately disallowed by the PEP, so that the >> implementation has some hope of being securable. > > If attribute access is allowed, arbitrary code can already > be triggered, so I don't see how this makes a difference > to security. Not quite. It depends on what you mean by 'arbitrary code'. Let's take a hypothetical example: Suppose I have a format string which I downloaded from the nefarious "evil.org" web site which I suspect may contain "evil" formatting fields. Now, I'd like to be able to use this format string, but I want to be able to contain the damage that it can do. For example, if I pass a list of integers as the format parameters, there is little harm that can be done. Even if my evil string contains things like "{0.__class__.__module__}" - in other words, even if it spiders through the base class list and the MRO list and everything else, there's little damage it can do, because it can't call any functions. Now, lets suppose that somewhere in the set of objects that are transitively reachable from those parameter values, there's an object which has an attribute such that accessing that attribute deletes my hard drive or has some other bad effect. Obviously this would be bad. Bad because my hard drive was deleted, sure, but even worse because I'm an idiot for writing such a stupid class in the first place. I know that's a bit over the top, but what I mean to say is that in the normal course of events, one can assume that attribute accesses are either stateless, or should at least *seem* to be stateless from the outside. It's considered bad form to go around writing classes where the mere access of an attribute has some potentially deleterious effect. Anyone who writes a class like that deserves to have their hard drive deleted IMHO. So the judgment was made that it's relatively safe to access attributes (even if they can be overloaded), whereas allowing method invocations is much less safe. So yes, theoretically attribute access can indeed run arbitrary code. But not in a world with mostly sane people in it. -- Talin From alan.mcintyre at gmail.com Tue May 1 05:31:37 2007 From: alan.mcintyre at gmail.com (Alan McIntyre) Date: Mon, 30 Apr 2007 23:31:37 -0400 Subject: [Python-3000] Traits/roles instead of ABCs In-Reply-To: <740c3aec0704301501u7df7b5a6uaea854d4716eb87e@mail.gmail.com> References: <43aa6ff70704291840s3384824et44ebfd360c15eda@mail.gmail.com> <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> <1d36917a0704300816ma3bf9c2o4dd674cfcefa9172@mail.gmail.com> <-3456230403858254882@unknownmsgid> <740c3aec0704301501u7df7b5a6uaea854d4716eb87e@mail.gmail.com> Message-ID: <1d36917a0704302031hd34ffcfu2eee879aef426931@mail.gmail.com> On 4/30/07, BJ?rn Lindqvist wrote: > On 4/30/07, Bill Janssen wrote: > > > +1 for preferring simple solutions to complex ones > > > > Me, too. But which is the simple solution? I tend to think ABCs are. > > Neither or. They are both an order of a magnitude more complex than > the problem they are designed to solve. Raymond Hettingers small list > of three example problems earlier in the thread, is the most concrete > description of what the problem really is all about. And I would > honestly rather sort them under "minor annoyances" than "really > critical stuff, needs to be fixed asap." Disclaimer: I've only tangentially followed this entire ABC discussion, and I'm commenting off the cuff without having read as much as I probably should have. I am not a "power user" of Python (by which I mean, I've never been tasked to solve a problem using Python that made want to use abstract classes, or tinker with how the isinstance or issubclass functions do their thing). That said, the impression that I get from some of the discussions here is that big helpings of complexity might get added to the core of the language, and that's just a little unsettling to me. Maybe it's just that I don't have to solve the sorts of problems as advocates of these ideas, or I just don't have an adequate background to understand how really useful these additions would be. On 4/30/07, Barry Warsaw wrote: > Interfaces and ABCs are really all about Programming in the Really > Large. Most Python programs don't need this stuff, and in fact, > having to deal with them in any way would IMO reduce the elegance of > Python for small to medium (and even most large) applications. I think this is what's generally bugging me: the impression that there's a push to add features to help out those that program in The Really Large, or in The Really Mathematical (rings and semigroups and monoids, oh my!). I have a nagging concern that these additions will clutter up the core, and--no matter how hard you try--adding them is going to have an impact on "run-of-the-mill" users of the language. My-2-cents'ly yours, Alan From jason.orendorff at gmail.com Tue May 1 06:32:04 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Tue, 1 May 2007 00:32:04 -0400 Subject: [Python-3000] Breakthrough in thinking about ABCs (PEPs 3119 and 3141) In-Reply-To: References: Message-ID: On 4/30/07, Guido van Rossum wrote: > The correct > approach is for TotallyOrdered to be a metaclass (is this the > typeclass thing in Haskell?). Mmmm. Typeclasses don't *feel* like metaclasses. Haskell types aren't objects. A typeclass is like an interface, but more expressive. Only an example has any hope of delivering the "aha!" here: -- This defines a typeclass called "Set e". -- "e" and "s" here are type variables. class Set e s where -- here are a few that behave like OO methods... size :: s -> Int -- (size set) returns an Int contains :: s -> e -> Bool -- (contains set element) returns Bool -- here are some where the two arguments have to be of the same type union :: s -> s -> s intersection :: s -> s -> s -- here's a constructor! fromList :: [e] -> s -- and here's a constant... with a default implementation! emptySet :: s emptySet = fromList [] Suppose someone has written a super-fast data structure for collections of ints. If I wanted to "register" that type as a Set, I would write: instance Set Int FastIntSet where -- the implementation goes in here size self = ...implement this using FastIntSet magic... More complex relationships among types are surprisingly easy to express. See if you can puzzle these out: instance Hashable e => Set e (HashSet e) where ... instance Cmp e => Set e (TreeSet e) where ... class PartialOrd t where (<=*) :: t -> t -> Bool instance (Set s, Eq s) => PartialOrd s where (a <=* b) = (intersection a b == a) See? It's nice. But, eh, this is what typeful languages do all day; they'd better be good at it. :) -j (Right now on a Haskell mailing list somewhere, a mirror image of me is trying to explain what's so cool about zipfile. Python wins. ;) From python at rcn.com Tue May 1 08:25:42 2007 From: python at rcn.com (Raymond Hettinger) Date: Mon, 30 Apr 2007 23:25:42 -0700 Subject: [Python-3000] PEP: Information Attributes Message-ID: <008a01c78bb9$e4283780$f001a8c0@RaymondLaptop1> Proto-PEP: Information Attributes (First Draft) Proposal: Testing hasattr() is a broadly applicable and flexible technique that works well whenever the presence of a method has an unambiguous interpretation (i.e. __hash__ for hashability, __iter__ for iterability, __len__ for sized containers); however, there are other methods with ambiguous interpretations that could be resolved by adding an information attribute. Motivation: Signal attributes are proposed as a lightweight alternative to abstract base classes. The essential argument is that duck-typing works fairly well and needs only minimal augmentation to address a small set of recurring challenges. In contrast, the ABC approach imposes an extensive javaesque framework that cements APIs and limits flexibility. Real world Python programming experience has shown little day-to-day need for broad-sweeping API definitions; instead, there seem to be only a handful of recurring issues that can easily be addressed by a lightweight list of information attributes. Use Cases with Ambiguous Interpretations * The presence of a __getitem__ method is ambiguous in that it can be interpreted as either having sequence or mapping behavior. The ambiguity is easily resolved with an attribute claiming either mapping behavior or sequence behavior. * The presence of a rich comparison operator such as __lt__ is ambiguous in that it can return a vector or a scalar, the scalar may or may not be boolean, and it may be a NotImplemented instance. Even the boolean case is ambigouus because __lt__ may imply a total ordering (as it does for numbers) or it may be a partial ordering (as it is for sets where __lt__ means a strict subset). That latter ambiguity (sortability) is easily resolved by an attribute indicating a total ordering. * Some methods such as set.__add__ are too restrictive in that they preclude interaction with non-sets. This makes it impossible to achieve set interoperability without subclassing from set (a choice which introduces other complications such as the inability to override set-to-set interactions). This situation is easily resolved by an attribute like obj.__fake__=set which indicates that the object intends to be a set proxy. * The __iter__ method doesn't tell you whether the object supports multiple iteration (such as with files) or single iteration (such as with lists). A __singleiterator__ attribute would clear-up the ambiguity. * While you can test for the presence of a write() method, it would be helpful to have a __readonly__ information attribute for file-like objects, cursors, immutables, and whatnot. Advantages The attribute approach is dynamic (doesn't require inheritance to work). It doesn't require mucking with isinstance() or other existing mechanisms. It restricts itself to making a limited, useful set of assertions rather than broadly covering a whole API. It leaves the proven pythonic notion of duck-typing as the rule rather than the exception. It resists the temptation to freeze all of the key APIs in concrete. From python at rcn.com Tue May 1 08:56:32 2007 From: python at rcn.com (Raymond Hettinger) Date: Mon, 30 Apr 2007 23:56:32 -0700 Subject: [Python-3000] PEP: Drop Implicit String Concatentation Message-ID: <009e01c78bbd$da8b71c0$f001a8c0@RaymondLaptop1> PEP: Remove Implicit String Concatenation Motivation One goal for Python 3000 should be to simplify the language by removing unnecessary features. Implicit string concatenation should be dropped in favor of existing techniques. This will simplify the grammar and simplify a user's mental picture of Python. The latter is important for letting the language "fit in your head". A large group of current users do not even know about implicit concatenation. Of those who do know about it, a large portion never use it or habitually avoid it. Of those both know about it and use it, very few could state with confidence the implicit operator precedence and under what circumstances it is computed when the definition is compiled versus when it is run. Uses and Substitutes * Multi-line strings: s = "In the beginning, " \ "there was a start." s = ("In the beginning," + "there was a start") * Complex regular expressions are sometimes stated in terms of several implicitly concatenated strings with each regex component on a different line and followed by a comment. The plus operator can be inserted here but it does make the regex harder to read. One alternative is to use the re.VERBOSE option. Another alternative is to build-up the regex with a series of += lines: r = ('a{20}' # Twenty A's 'b{5}' # Followed by Five B's ) r = '''a{20} # Twenty A's b{5} # Followed by Five B's ''' # Compiled with thee re.VERBOSE flag r = 'a{20}' # Twenty A's r += 'b{5}' # Followed by Five B's Automatic Substitution When transitioning to Py3.0, some care should be taken to not blindly drop-in a plus operator and possibly incur a change in semantics due to operator precendence. A pair such as: "abc" "def" should be replaced using parentheses: ("abc" + "def") From daniel at stutzbachenterprises.com Tue May 1 09:00:33 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 1 May 2007 02:00:33 -0500 Subject: [Python-3000] BList PEP Message-ID: PEP: 30XX Title: BList: A faster list-like type Version: $Revision$ Last-Modified: $Date$ Author: Daniel Stutzbach Discussions-To: Python 3000 List Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-Apr-2007 Python-Version: 2.6 and/or 3.0 Post-History: 30-Apr-2007 Abstract ======== The common case for list operations is on small lists. The current array-based list implementation excels at small lists due to the strong locality of reference and infrequency of memory allocation operations. However, an array takes O(n) time to insert and delete elements, which can become problematic as the list gets large. This PEP introduces a new data type, the BList, that has array-like and tree-like aspects. It enjoys the same good performance on small lists as the existing array-based implementation, but offers superior asymptotic performance for most operations. This PEP proposes replacing the makes two mutually exclusive proposals for including the BList type in Python: 1. Add it to the collections module, or 2. Replace the existing list type Motivation ========== The BList grew out of the frustration of needing to rewrite intuitive algorithms that worked fine for small inputs but took O(n**2) time for large inputs due to the underlying O(n) behavior of array-based lists. The deque type, introduced in Python 2.4, solved the most common problem of needing a fast FIFO queue. However, the deque type doesn't help if we need to repeatedly insert or delete elements from the middle of a long list. A wide variety of data structure provide good asymptotic performance for insertions and deletions, but they either have O(n) performance for other operations (e.g., linked lists) or have inferior performance for small lists (e.g., binary trees and skip lists). The BList type proposed in this PEP is based on the principles of B+Trees, which have array-like and tree-like aspects. The BList offers array-like performance on small lists, while offering O(log n) asymptotic performance for all insert and delete operations. Additionally, the BList implements copy-on-write under-the-hood, so even operations like getslice take O(log n) time. The table below compares the asymptotic performance of the current array-based list implementation with the asymptotic performance of the BList. ========= ================ ==================== Operation Array-based list BList ========= ================ ==================== Copy O(n) **O(1)** Append **O(1)** O(log n) Insert O(n) **O(log n)** Get Item **O(1)** O(log n) Set Item **O(1)** **O(log n)** Del Item O(n) **O(log n)** Iteration O(n) O(n) Get Slice O(k) **O(log n)** Del Slice O(n) **O(log n)** Set Slice O(n+k) **O(log k + log n)** Extend O(k) **O(log k + log n)** Sort O(n log n) O(n log n) Multiply O(nk) **O(log k)** ========= ================ ==================== An extensive empirical comparison of Python's array-based list and the BList are available at [2]_. Use Case Trade-offs =================== The BList offers superior performance for many, but not all, operations. Choosing the correct data type for a particular use case depends on which operations are used. Choosing the correct data type as a built-in depends on balancing the importance of different use cases and the magnitude of the performance differences. For the common uses cases of small lists, the array-based list and the BList have similar performance characteristics. For the slightly less common case of large lists, there are two common uses cases where the existing array-based list outperforms the existing BList reference implementation. These are: 1. A large LIFO stack, where there are many .append() and .pop(-1) operations. Each operation is O(1) for an array-based list, but O(log n) for the BList. 2. A large list that does not change size. The getitem and setitem calls are O(1) for an array-based list, but O(log n) for the BList. In performance tests on a 10,000 element list, BLists exhibited a 50% and 5% increase in execution time for these two uses cases, respectively. The performance for the LIFO use case could be improved to O(n) time, by caching a pointer to the right-most leaf within the root node. For lists that do not change size, the common case of sequential access could also be improved to O(n) time via caching in the root node. However, the performance of these approaches has not been empirically tested. Many operations exhibit a tremendous speed-up (O(n) to O(log n)) when switching from the array-based list to BLists. In performance tests on a 10,000 element list, operations such as getslice, setslice, and FIFO-style insert and deletes on a BList take only 1% of the time needed on array-based lists. In light of the large performance speed-ups for many operations, the small performance costs for some operations will be worthwhile for many (but not all) applications. Implementation ============== The BList is based on the B+Tree data structure. The BList is a wide, bushy tree where each node contains an array of up to 128 pointers to its children. If the node is a leaf, its children are the user-visible objects that the user has placed in the list. If node is not a leaf, its children are other BList nodes that are not user-visible. If the list contains only a few elements, they will all be a children of single node that is both the root and a leaf. Since a node is little more than array of pointers, small lists operate in effectively the same way as an array-based data type and share the same good performance characteristics. The BList maintains a few invariants to ensure good (O(log n)) asymptotic performance regardless of the sequence of insert and delete operations. The principle invariants are as follows: 1. Each node has at most 128 children. 2. Each non-root node has at least 64 children. 3. The root node has at least 2 children, unless the list contains fewer than 2 elements. 4. The tree is of uniform depth. If an insert would cause a node to exceed 128 children, the node spawns a sibling and transfers half of its children to the sibling. The sibling is inserted into the node's parent. If the node is the root node (and thus has no parent), a new parent is created and the depth of the tree increases by one. If a deletion would cause a node to have fewer than 64 children, the node moves elements from one of its siblings if possible. If both of its siblings also only have 64 children, then two of the nodes merge and the empty one is removed from its parent. If the root node is reduced to only one child, its single child becomes the new root (i.e., the depth of the tree is reduced by one). In addition to tree-like asymptotic performance and array-like performance on small-lists, BLists support transparent **copy-on-write**. If a non-root node needs to be copied (as part of a getslice, copy, setslice, etc.), the node is shared between multiple parents instead of being copied. If it needs to be modified later, it will be copied at that time. This is completely behind-the-scenes; from the user's point of view, the BList works just like a regular Python list. Memory Usage ============ In the worst case, the leaf nodes of a BList have only 64 children each, rather than a full 128, meaning that memory usage is around twice that of a best-case array implementation. Non-leaf nodes use up a negligible amount of additional memory, since there are at least 63 times as many leaf nodes as non-leaf nodes. The existing array-based list implementation must grow and shrink as items are added and removed. To be efficient, it grows and shrinks only when the list has grow or shrunk exponentially. In the worst case, it, too, uses twice as much memory as the best case. In summary, the BList's memory footprint is not significantly different from the existing array-based implementation. Backwards Compatibility ======================= If the BList is added to the collections module, backwards compatibility is not an issue. This section focuses on the option of replacing the existing array-based list with the BList. For users of the Python interpreter, a BList has an identical interface to the current list-implementation. For virtually all operations, the behavior is identical, aside from execution speed. For the C API, BList has a different interface than the existing list-implementation. Due to its more complex structure, the BList does not lend itself well to poking and prodding by external sources. Thankfully, the existing list-implementation defines an API of functions and macros for accessing data from list objects. Google Code Search suggests that the majority of third-party modules uses the well-defined API rather than relying on the list's structure directly. The table below summarizes the search queries and results: ======================== ================= Search String Number of Results ======================== ================= PyList_GetItem 2,000 PySequence_GetItem 800 PySequence_Fast_GET_ITEM 100 PyList_GET_ITEM 400 \[^a\-zA\-Z\_\]ob_item 100 ======================== ================= This can be achieved in one of two ways: 1. Redefine the various accessor functions and macros in listobject.h to access a BList instead. The interface would be unchanged. The functions can easily be redefined. The macros need a bit more care and would have to resort to function calls for large lists. The macros would need to evaluate their arguments more than once, which could be a problem if the arguments have side effects. A Google Code Search for "PyList_GET_ITEM\(\[^)\]+\(" found only a handful of cases where this occurs, so the impact appears to be low. The few extension modules that use list's undocumented structure directly, instead of using the API, would break. The core code itself uses the accessor macros fairly consistently and should be easy to port. 2. Deprecate the existing list type, but continue to include it. Extension modules wishing to use the new BList type must do so explicitly. The BList C interface can be changed to match the existing PyList interface so that a simple search-replace will be sufficient for 99% of module writers. Existing modules would continue to compile and work without change, but they would need to make a deliberate (but small) effort to migrate to the BList. The downside of this approach is that mixing modules that use BLists and array-based lists might lead to slow down if conversions are frequently necessary. Reference Implementation ======================== A reference implementations of the BList is available for CPython at [1]_. The source package also includes a pure Python implementation, originally developed as a prototype for the CPython version. Naturally, the pure Python version is rather slow and the asymptotic improvements don't win out until the list is quite large. When compiled with Py_DEBUG, the C implementation checks the BList invariants when entering and exiting most functions. An extensive set of test cases is also included in the source package. The test cases include the existing Python sequence and list test cases as a subset. When the interpreter is built with Py_DEBUG, the test cases also check for reference leaks. Porting to Other Python Variants -------------------------------- If the BList is added to the collections module, other Python variants can support it in one of three ways: 1. Make blist an alias for list. The asymptotic performance won't be as good, but it'll work. 2. Use the pure Python reference implementation. The performance for small lists won't be as good, but it'll work. 3. Port the reference implementation. Discussion ========== This proposal has been discussed briefly on the Python-3000 mailing list [3]_. Although a number of people favored the proposal, there were also some objections. Below summarizes the pros and cons as observed by posters to the thread. General comments: - Pro: Will outperform the array-based list in most cases - Pro: "I've implemented variants of this ... a few different times" - Con: Desirability and performance in actual applications is unproven Comments on adding BList to the collections module: - Pro: Matching the list-API reduces the learning curve to near-zero - Pro: Useful for intermediate-level users; won't get in the way of beginners - Con: Proliferation of data types makes the choices for developers harder. Comments on replacing the array-based list with the BList: - Con: Impact on extension modules (addressed in `Backwards Compatibility`_) - Con: The use cases where BLists are slower are important (see `Use Case Trade-Offs`_ for how these might be addressed). - Con: The array-based list code is simple and easy to maintain To assess the desirability and performance in actual applications, Raymond Hettinger suggested releasing the BList as an extension module (now available at [1]_). If it proves useful, he felt it would be a strong candidate for inclusion in 2.6 as part of the collections module. If widely popular, then it could be considered for replacing the array-based list, but not otherwise. Guido van Rossum commented that he opposed the proliferation of data types, but favored replacing the array-based list if backwards compatibility could be addressed and the BList's performance was uniformly better. On-going Tasks ============== - Reduce the memory footprint of small lists - Implement TimSort for BLists, so that best-case sorting is O(n) instead of O(log n). - Implement __reversed__ - Cache a pointer in the root to the rightmost leaf, to make LIFO operation O(n) time. References ========== .. [1] Reference Implementations for C and Python: http://www.python.org/pypi/blist/ .. [2] Empirical performance comparison between Python's array-based list and the blist: http://stutzbachenterprises.com/blist/ .. [3] Discussion on python-3000 starting at post: http://mail.python.org/pipermail/python-3000/2007-April/006757.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From jcarlson at uci.edu Tue May 1 09:24:30 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 01 May 2007 00:24:30 -0700 Subject: [Python-3000] PEP: Information Attributes In-Reply-To: <008a01c78bb9$e4283780$f001a8c0@RaymondLaptop1> References: <008a01c78bb9$e4283780$f001a8c0@RaymondLaptop1> Message-ID: <20070501000436.6450.JCARLSON@uci.edu> "Raymond Hettinger" wrote: > Proto-PEP: Information Attributes (First Draft) > > Proposal: > > Testing hasattr() is a broadly applicable and flexible technique that works well > whenever the presence of a method has an unambiguous interpretation > (i.e. __hash__ for hashability, __iter__ for iterability, __len__ for sized > containers); however, there are other methods with ambiguous interpretations > that could be resolved by adding an information attribute. To me, this seems more like traits/roles than ABCs. Though I haven't weighed in on either of them, generally I'm with Raymond and others in the whole "ABCs seem like overkill" perspective. As such, I'm -1 on ABCs, but +1 on the general idea of traits/roles - of which I would consider this PEP to be one. My concern with Information Attributes is similar to my concern about ABCs; in order to state the information available from these information attributes, they need to be part of the class or instance. On built-in types, users would not be able to add things to classes or instances, as is the case with the numpy folks wanting to add 'ring' to integers. While I've not seen a PEP for offering live traits/roles addition or removal, I suspect that it would involve weak key dictionaries adding traits to classes, and only allow hashable instances for single-object trait additions (depending on the kinds of traits/roles, it could probably be implemented as a dictionary of weak key sets). I would be +1 in this case, as it would offer most of the benefits of ABCs*, with none of the pre-implementation drawbacks. - Josiah * Related to ABCs is the __issubclass__ and __isinstance__ stuff that allows for proxy objects directly. Traits/roles could be massaged to do similar things, but using __is...__ directly seems like it would perform this operation better. I'm not a real big From python at rcn.com Tue May 1 09:31:21 2007 From: python at rcn.com (Raymond Hettinger) Date: Tue, 1 May 2007 00:31:21 -0700 Subject: [Python-3000] PEP: Eliminate __del__ Message-ID: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> PEP: Eliminating __del__ Motivation Historically, __del__ has been one of the more error-laden dark corners of the language. From an implementation point of view, it has proven to be a thorny maintenance problem that grew almost beyond the range of human comprehension once garbage collection was introduced. >From a user point-of-view, the method is deceptively simple and tends to lead to fragile designs. The fragility arises in-part because it is difficult to know when or if an object is going to be deleted whether by reference counts going to zero or by garbage collection. Even if all the relationships are known at the time a script is written, a subsequent maintainer may innocently introduce (directly or indirectly) a reference that prevents the desired finalization code from running. From a design perspective, it is almost always better to provide for explicit finalization (for example, experienced Python programmers have learned to call file.close() and connection.close() rather than rely on automatic closing when the file or sql connection goes out of scope). For finalization that needs to occur only a the end of all operations, users have learned to use atexit() as the preferred technique. That leaves a handful of cases where some action does need to be taken when an object is being collected. In those cases, users have turned to __del__ "because it was there". Through the school of hard-knocks, they eventually learn to avoid to hazards of accidentally bringing the object back to life during finalization and possibly leaving the object in an invalid half-finalized state. The risks occur because the object is still alive at the time the arbitrary python code in __del__ is called. The alternative is to code the automatic finalization steps using weakref callbacks. For those used to using __del__, it takes a little while to learn the idiom but essentially the technique is hold a proxy or ref with a callback to a boundmethod for finalization: self.resource = resource = CreateResource() self.callbacks.append(proxy(resource, resource.closedown)) In this manner, all of the object's resources can be freed automatically when the object is collected. Note, that the callbacks only bind the resource object and not client object, so the client object can already have been collected and the teardown code can be run without risk of resurrecting the client (with a possibly invalid state). Proposal The proposal is to eliminate __del__ and thereby eliminate a strong temptation to code implicit rather than explicit finalization. The remaining approaches to teardown procedures such as atexit() and weakref callbacks are much less problematic and should become the one way to do it. From jcarlson at uci.edu Tue May 1 09:53:51 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Tue, 01 May 2007 00:53:51 -0700 Subject: [Python-3000] BList PEP In-Reply-To: References: Message-ID: <20070501002446.6453.JCARLSON@uci.edu> "Daniel Stutzbach" wrote: > Title: BList: A faster list-like type > 1. Add it to the collections module, or +1 > 2. Replace the existing list type -1 For types that are used every day, I can't help but prefer a simple implementation. Among the features of the current Python list implementation is that small lists (0, 4, 8, 16 elements) use very little space. Your current BList implementation uses a fixed size for the smallest sequence, 128, which would offer worse memory performance for applications where many small lists are common. > ========= ================ ==================== > Operation Array-based list BList > ========= ================ ==================== > Copy O(n) **O(1)** > Append **O(1)** O(log n) > Insert O(n) **O(log n)** > Get Item **O(1)** O(log n) > Set Item **O(1)** **O(log n)** what's going on with this pair? ^^ ^^ > Del Item O(n) **O(log n)** > Iteration O(n) O(n) > Get Slice O(k) **O(log n)** > Del Slice O(n) **O(log n)** > Set Slice O(n+k) **O(log k + log n)** > Extend O(k) **O(log k + log n)** > Sort O(n log n) O(n log n) > Multiply O(nk) **O(log k)** > ========= ================ ==================== > The performance for the LIFO use case could be improved to O(n) time, You probably want to mention "over n appends/pop(-1)s". You also may want to update the above chart to take into consideration that you plan on doing that modification. Generally, the BList is as fast or faster asymptotically than a list for everything except random getitem/setitem; at which point it is O(logn) rather than O(1). You may want to explicitly state this in some later version. > Implementation > ============== > > The BList is based on the B+Tree data structure. The BList is a wide, > bushy tree where each node contains an array of up to 128 pointers to > its children. If the node is a leaf, its children are the > user-visible objects that the user has placed in the list. If node is > not a leaf, its children are other BList nodes that are not > user-visible. If the list contains only a few elements, they will all > be a children of single node that is both the root and a leaf. Since > a node is little more than array of pointers, small lists operate in > effectively the same way as an array-based data type and share the > same good performance characteristics. In the collections module, there exists a deque type. This deque type more or less uses a sequence of 64 pointers, the first two of which are linked list pointers to the previous and next block of pointers. I don't know how much tuning was done to choose this value of 64, but you may consider reducing the number of pointers to 64 for the the same cache/allocation behavior. - Josiah From martin at v.loewis.de Tue May 1 11:17:02 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 01 May 2007 11:17:02 +0200 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> Message-ID: <4637058E.2070604@v.loewis.de> > Historically, __del__ has been one of the more error-laden dark corners > of the language. From an implementation point of view, it has > proven to be a thorny maintenance problem that grew almost beyond > the range of human comprehension once garbage collection was introduced. +1 Martin From martin at v.loewis.de Tue May 1 12:52:02 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 01 May 2007 12:52:02 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers Message-ID: <46371BD2.7050303@v.loewis.de> PEP: 31xx Title: Supporting Non-ASCII Identifiers Version: $Revision$ Last-Modified: $Date$ Author: Martin v. L?wis Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 1-May-2007 Python-Version: 3.0 Post-History: Abstract ======== This PEP suggests to support Non-ASCII letters (such as accented characters, Cyrillic, Greek, Kanji, etc.) in Python identifiers. Rationale ========= Python code is written by many people in the world who are not familiar with the English language, or even well-acquainted with the Latin writing system. Such developers often desire to define classes and functions with names in their native languages, rather than having to come up with an (often incorrect) English translation of the concept they want to name. For some languages, common transliteration systems exists (in particular, for the Latin-bases writing systems). For other languages, users have larger difficulties to use Latin to write their native words. Common Objections ================= Some objections are often raised agains proposals similar to this one. People claim that they will not be able to use a library if to do so they have to use characters they cannot type on their keyboards. However, it is the choice of the designer of the library to decide on various constraints for using the library: people may not be able to use the library because they cannot get physical access to the source code (because it is not published), or because licensing prohibits usage, or because the documentation is in a language they cannot understand. A developer wishing to make a library widely available needs to make a number of explicit choices (such as publication, licensing, language of documentation, and language of identifiers). It should always be the choice of the author to make these decisions - not the choice of the language designers. In particular, projects wishing to have wide usage probably might want to establish a policy that all identifiers, comments, and documentation is written in English (see the GNU coding style guide for an example of such a policy). Restricting the language to ASCII-only identifiers does not enforce comments and documentation to be English, or the identifiers actually to be English words, so an additional policy is necessary, anyway. Specification of Language Changes ================================= The syntax of identifiers in Python will be based on the Unicode standard annex UAX-31 [1]_, with elaboration and changes as defined below. Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.5. This specification only introduces additional characters from outside the ASCII range. For other characters, the classification uses the version of the Unicode Character Database as included in the unicodedata module. The identifier syntax is \*. ID_Start is defined as all characters having one of the general categories uppercase letters (Lu), lowercase letters (Ll), titlecase letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers (Nl), plus the underscore (XXX what are "stability extensions listed in UAX 31). ID_Continue is defined as all characters in ID_Start, plus nonspacing marks (Mn), spacing combining marks (Mc), decimal number (Nd), and connector punctuations (Pc). All identifiers are converted into the normal form NFC while parsing; comparison of identifiers is based on NFC. Policy Specification ==================== As an addition to the Python Coding style, the following policy is prescribed: All identifiers in the Python standard library MUST use ASCII-only identifiers, and SHOULD use English words whereever feasible. As an option, this specification can be applied to Python 2.x. In that case, ASCII-only identifiers would continue to be represented as byte string objects in namespace dictionaries; identifiers with non-ASCII characters would be represented as Unicode strings. Implementation ============== The following changes will need to be made to the parser: 1. If a non-ASCII character is found in the UTF-8 representation of the source code, a forward scan is made to find the first ASCII non-identifier character (e.g. a space or punctuation character) 2. The entire UTF-8 string is passed to a function to normalize the string to NFC, and then verify that it follows the identifier syntax. No such callout is made for pure-ASCII identifiers, which continue to be parsed the way they are today. 3. If this specification is implemented for 2.x, reflective libraries (such as pydoc) must be verified to continue to work when Unicode strings appear in __dict__ slots as keys. References ========== .. [1] http://www.unicode.org/reports/tr31/ Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From rasky at develer.com Tue May 1 13:32:52 2007 From: rasky at develer.com (Giovanni Bajo) Date: Tue, 01 May 2007 13:32:52 +0200 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> Message-ID: On 01/05/2007 9.31, Raymond Hettinger wrote: > PEP: Eliminating __del__ *sigh* I'm still -1, but I won't revive the discussion of course. I would still like if the PEP listed the alternative me and others were proposing, that is changing the semantic of __del__ (or dropping __del__ in favor of a new __close__ method with the new semantic) such as: 1) It is guaranteed to be called only once per object. 2) In case of circular reference, __del__ methods are called in random order on the objects of the cycle, and then the cycle is broken. (This is because step #1 fixes the main problem with calling __del__ in random orders). In fact, your PEP concentrates on the problem of implicit finalization, which I don't think it's generally perceived as *the* problem with __del__. I'm still a *strong* proponent of implicit finalization (aka RAII). It always worked well for me. The problem is that __del__ currently *breaks* implicit finalization, causing garbage if it's used by objects in a cycle. With the fixes above, I'd use it more not less. -- Giovanni Bajo From rasky at develer.com Tue May 1 13:34:47 2007 From: rasky at develer.com (Giovanni Bajo) Date: Tue, 01 May 2007 13:34:47 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <46371BD2.7050303@v.loewis.de> References: <46371BD2.7050303@v.loewis.de> Message-ID: On 01/05/2007 12.52, Martin v. L?wis wrote: > PEP: 31xx > Title: Supporting Non-ASCII Identifiers Isn't this already blacklisted in PEP 3099? -- Giovanni Bajo From jimjjewett at gmail.com Tue May 1 16:11:26 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 1 May 2007 10:11:26 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070430200255.04b88e10@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430200255.04b88e10@sparrow.telecommunity.com> Message-ID: On 4/30/07, Phillip J. Eby wrote: > At 07:29 PM 4/30/2007 -0400, Jim Jewett wrote: > >On 4/30/07, Phillip J. Eby wrote: > >>PEP 3115, however, requires that a class' metaclass be determined > >>*before* the class body has executed, making it impossible to use this > >>technique for class decoration any more. > >It doesn't say what that metaclass has to do, though. > >Is there any reason the metaclass couldn't delegate differently > >depending on the value of __my_magic_attribute__ ? > Sure -- that's what I suggested in the "super(), class decorators, and PEP > 3115" thread, but Guido voted -1 on adding such a magic attribute to PEP > 3115. I don't think we're understanding each other. Why couldn't you use a suitably fleshed-out version of: class _ConditionalMetaclass(type): def __init__(cls, name, bases, dct): super(_ConditionalMetaclass, cls).__init__(name, bases, dct) hooks = [(k, v) for (k, v) in dct.items() if k.startswith("_afterhook_")] for k, v in hooks: cls = AfterHooksRegistry[k](cls, v) -jJ From jimjjewett at gmail.com Tue May 1 16:31:08 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 1 May 2007 10:31:08 -0400 Subject: [Python-3000] Addition to PEP 3101 In-Reply-To: <4636AE9E.2020905@acm.org> References: <8f01efd00704300953t6154d7e1j7ef18cead1acb344@mail.gmail.com> <46368EE5.6050409@canterbury.ac.nz> <4636AE9E.2020905@acm.org> Message-ID: On 4/30/07, Talin wrote: > Greg Ewing wrote: > > Patrick Maupin wrote: > >> Method calls are deliberately disallowed by the PEP, so that the > >> implementation has some hope of being securable. > > If attribute access is allowed, arbitrary code can already > > be triggered, so I don't see how this makes a difference > > to security. > Not quite. It depends on what you mean by 'arbitrary code'. ... If I understood that correctly, then (1) The format string cannot run arbitrary code, but (2) The formatted objects themselves can. This is probably a feature, since you can pass proxy objects, but it should definately be called out explicitly in the security section (currently just some text in Simple and Compound Names section). Example Text: Note that while (literal strings used as) format strings are effectively sandboxed, the formatted objects themselves are not. "My name is {0[name]}".format(evil_map) would still allow evil_map to run arbitrary code. -jJ From jimjjewett at gmail.com Tue May 1 16:52:18 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 1 May 2007 10:52:18 -0400 Subject: [Python-3000] Breakthrough in thinking about ABCs (PEPs 3119 and 3141) In-Reply-To: References: Message-ID: On 4/30/07, Guido van Rossum wrote: > The idea of overloading isinstance and issubclass is running into some > resistance. I still like it, but if there is overwhelming discomfort, > we can change it so that instead of writing isinstance(x, C) or > issubclass(D, C) (where C overloads these operations), you'd have to > write something like C.hasinstance(x) or C.hassubclass(D), where > hasinstance and hassubclass are defined by some ABC metaclass. I'd > still like to have the spec for hasinstance and hassubclass in the > core language, so that different 3rd party frameworks don't need to > invent different ways of spelling this inquiry. Would it help to get away from class/instance entirely, and call them something like isexample? (Though class vs instance gets harder then. areexamples?) (And yes, I think it would, but no, I don't yet have the code written out to explain.) -jJ From jimjjewett at gmail.com Tue May 1 17:18:52 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 1 May 2007 11:18:52 -0400 Subject: [Python-3000] PEP: Information Attributes In-Reply-To: <008a01c78bb9$e4283780$f001a8c0@RaymondLaptop1> References: <008a01c78bb9$e4283780$f001a8c0@RaymondLaptop1> Message-ID: On 5/1/07, Raymond Hettinger wrote: > Use Cases with Ambiguous Interpretations > * The presence of a __getitem__ method is ambiguous in that it can be > interpreted as either having sequence or mapping behavior. The ambiguity is > easily resolved with an attribute claiming either mapping behavior or > sequence behavior. If you're really duck-typing, it doesn't matter; just try the key and see if it works. At this level, Sequences *are* mappings which happen to have (exactly the) integers from 0 to size-1 as keys. Knowing that the keys are integers won't tell you whether you can push and pop. The advantage of the ABC variant is that you do know you can push and pop, because if the object itself didn't provide an implementation, then python will fall back to the (abstract class' concrete) default implementation for you. > * The presence of a rich comparison operator such as __lt__ is ambiguous in that > it can return a vector or a scalar, the scalar may or may not be boolean, > and it may be a NotImplemented instance. Even the boolean case is ambigouus > because __lt__ may imply a total ordering (as it does for numbers) or it may > be a partial ordering (as it is for sets where __lt__ means a strict > subset). That latter ambiguity (sortability) is easily resolved by an > attribute indicating a total ordering. erm... sortability with respect to what? Only instances of its own class? With other string-like things? > * Some methods such as set.__add__ are too restrictive in that they preclude > interaction with non-sets. This makes it impossible to achieve set > interoperability without subclassing from set (a choice which introduces > other complications such as the inability to override set-to-set > interactions). This situation is easily resolved by an attribute like > obj.__fake__=set which indicates that the object intends to be a set proxy. How does this improve on registering the object with the abstract Set class? If anything, it seems worse, because you need to be able to modify obj. ( Josiah suggests a lookaside dictionary -- but that might as well *be* the ABC.) > * The __iter__ method doesn't tell you whether the object supports > multiple iteration (such as with files) or single iteration (such as with lists). > A __singleiterator__ attribute would clear-up the ambiguity. This seems backwards. I hope that was just a typo, but *I* can't be as sure from a single name as I could from a docstringed class. > * While you can test for the presence of a write() method, it would be > helpful to have a __readonly__ information attribute for file-like objects, > cursors, immutables, and whatnot. readonly meaning that I can't modify it, or readonly meaning that no one else will? > The attribute approach is dynamic (doesn't require inheritance to work). It > doesn't require mucking with isinstance() or other existing mechanisms. I think a Traits version of ABCs could do that as well, and will try to get an example coded in the next week or so. > It restricts itself to making a limited, useful set of assertions rather than > broadly covering a whole API. It leaves the proven pythonic notion of > duck-typing as the rule rather than the exception. It resists the temptation > to freeze all of the key APIs in concrete. I feel almost the opposite. Because the attribute is right there on the object (instead of in a registry I have to import), it is more tempting to use it; I expect this will cause many more people will code defensively by adding extra asserts, so that it becomes more important to support. Because the object itself is only a single namespace, it effectively freezes the API that goes out first. Josiah wrote: > ... suspect ... weak key dictionaries adding > traits to classes, and only allow hashable instances for single-object (I assumed the key would be id(obj), though it would still need a weakref for data integrity.) jJ From collinw at gmail.com Tue May 1 17:22:45 2007 From: collinw at gmail.com (Collin Winter) Date: Tue, 1 May 2007 08:22:45 -0700 Subject: [Python-3000] PEP: Drop Implicit String Concatentation In-Reply-To: <009e01c78bbd$da8b71c0$f001a8c0@RaymondLaptop1> References: <009e01c78bbd$da8b71c0$f001a8c0@RaymondLaptop1> Message-ID: <43aa6ff70705010822q1066ab51sa7c58547dd3d18f1@mail.gmail.com> On 4/30/07, Raymond Hettinger wrote: > PEP: Remove Implicit String Concatenation Jim Jewett has already submitted a PEP that does this, PEP 3126. It's in SVN but not showing up on PEP 0 for some reason: http://svn.python.org/view/peps/trunk/pep-3126.txt?rev=55030&view=markup Collin Winter From p.f.moore at gmail.com Tue May 1 17:42:04 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 1 May 2007 16:42:04 +0100 Subject: [Python-3000] BList PEP In-Reply-To: References: Message-ID: <79990c6b0705010842m20f0cfa1o4dd14574fbc8769@mail.gmail.com> > - Implement TimSort for BLists, so that best-case sorting is O(n) > instead of O(log n). Is that a typo? Why would you want to make best-case sorting worse? Paul. From jimjjewett at gmail.com Tue May 1 17:43:33 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 1 May 2007 11:43:33 -0400 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> Message-ID: On 5/1/07, Raymond Hettinger wrote: > The alternative is to code the automatic finalization steps using > weakref callbacks. For those used to using __del__, it takes a little > while to learn the idiom but essentially the technique is hold a proxy > or ref with a callback to a boundmethod for finalization: > self.resource = resource = CreateResource() > self.callbacks.append(proxy(resource, resource.closedown)) > In this manner, all of the object's resources can be freed automatically > when the object is collected. Note, that the callbacks only bind > the resource object and not client object, so the client object > can already have been collected and the teardown code can be run > without risk of resurrecting the client (with a possibly invalid state). That alternative is pretty ugly, and I think we found some cases where it required major rewriting. (I don't have them handy, but may end up searching for them again, if need be.) A smaller change would be to add __close__ (which covers most use cases), or even to give __del__ the __close__ semantics. The key distinction is that __close__ says to go ahead and break the cycle in an arbitrary location, rather than immortalizing it. -jJ From collinw at gmail.com Tue May 1 17:44:15 2007 From: collinw at gmail.com (Collin Winter) Date: Tue, 1 May 2007 08:44:15 -0700 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <46371BD2.7050303@v.loewis.de> References: <46371BD2.7050303@v.loewis.de> Message-ID: <43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com> On 5/1/07, "Martin v. L?wis" wrote: > Rationale > ========= > > Python code is written by many people in the world who are not > familiar with the English language, or even well-acquainted with the > Latin writing system. [snip] That makes absolutely no sense. You mean to tell me that people write Python without being able to understand any of the language's keywords, builtin functions, standard library or documentation? -?. Collin Winter From daniel at stutzbachenterprises.com Tue May 1 17:46:36 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 1 May 2007 10:46:36 -0500 Subject: [Python-3000] BList PEP In-Reply-To: <79990c6b0705010842m20f0cfa1o4dd14574fbc8769@mail.gmail.com> References: <79990c6b0705010842m20f0cfa1o4dd14574fbc8769@mail.gmail.com> Message-ID: On 5/1/07, Paul Moore wrote: > > - Implement TimSort for BLists, so that best-case sorting is O(n) > > instead of O(log n). > > Is that a typo? Why would you want to make best-case sorting worse? Yes, it should read O(n log n), not O(log n). -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From collinw at gmail.com Tue May 1 17:52:13 2007 From: collinw at gmail.com (Collin Winter) Date: Tue, 1 May 2007 08:52:13 -0700 Subject: [Python-3000] Adding class decorators to PEP 318 Message-ID: <43aa6ff70705010852g112924a2hbf13f31d83631a85@mail.gmail.com> In talking to Neal Norwitz about this, I don't see a need for a separate PEP for class decorators; we already have a decorators PEP, #318. The following is a proposed patch to PEP 318 that adds in class decorators. Collin Winter Index: pep-0318.txt =================================================================== --- pep-0318.txt (revision 55034) +++ pep-0318.txt (working copy) @@ -1,5 +1,5 @@ PEP: 318 -Title: Decorators for Functions and Methods +Title: Decorators for Functions, Methods and Classes Version: $Revision$ Last-Modified: $Date$ Author: Kevin D. Smith, Jim Jewett, Skip Montanaro, Anthony Baxter @@ -9,7 +9,7 @@ Created: 05-Jun-2003 Python-Version: 2.4 Post-History: 09-Jun-2003, 10-Jun-2003, 27-Feb-2004, 23-Mar-2004, 30-Aug-2004, - 2-Sep-2004 + 2-Sep-2004, 30-Apr-2007 WarningWarningWarning @@ -22,24 +22,40 @@ negatives of each form. +UpdateUpdateUpdate +================== + +In April 2007, this PEP was updated to reflect the evolution of the Python +community's attitude toward class decorators. Though they had previously +been rejected as too obscure and with limited use-cases, by mid-2006, +class decorators had come to be seen as the logical next step, with some +wondering why they hadn't been included originally. As a result, class +decorators will ship in Python 2.6. + +This PEP has been modified accordingly, with references to class decorators +injected into the narrative. While some references to the lack of class +decorators have been left in place to preserve the historical record, others +have been removed for the sake of coherence. + + Abstract ======== -The current method for transforming functions and methods (for instance, -declaring them as a class or static method) is awkward and can lead to -code that is difficult to understand. Ideally, these transformations -should be made at the same point in the code where the declaration -itself is made. This PEP introduces new syntax for transformations of a -function or method declaration. +The current method for transforming functions, methods and classes (for +instance, declaring a method as a class or static method) is awkward and +can lead to code that is difficult to understand. Ideally, these +transformations should be made at the same point in the code where the +declaration itself is made. This PEP introduces new syntax for +transformations of a function, method or class declaration. Motivation ========== -The current method of applying a transformation to a function or method -places the actual transformation after the function body. For large -functions this separates a key component of the function's behavior from -the definition of the rest of the function's external interface. For +The current method of applying a transformation to a function, method or class +places the actual transformation after the body. For large +code blocks this separates a key component of the object's behavior from +the definition of the rest of the object's external interface. For example:: def foo(self): @@ -69,14 +85,22 @@ are not as immediately apparent. Almost certainly, anything which could be done with class decorators could be done using metaclasses, but using metaclasses is sufficiently obscure that there is some attraction -to having an easier way to make simple modifications to classes. For -Python 2.4, only function/method decorators are being added. +to having an easier way to make simple modifications to classes. The +following is much clearer than the metaclass-based alternative:: + @singleton + class Foo(object): + pass +Because of the greater ease-of-use of class decorators and the symmetry +with function and method decorators, class decorators will be included in +Python 2.6. + + Why Is This So Hard? -------------------- -Two decorators (``classmethod()`` and ``staticmethod()``) have been +Two method decorators (``classmethod()`` and ``staticmethod()``) have been available in Python since version 2.2. It's been assumed since approximately that time that some syntactic support for them would eventually be added to the language. Given this assumption, one might @@ -135,11 +159,16 @@ .. _gareth mccaughan: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=slrna40k88.2h9o.Gareth.McCaughan%40g.local -Class decorations seem like an obvious next step because class +Class decorations seemed like an obvious next step because class definition and function definition are syntactically similar, -however Guido remains unconvinced, and class decorators will almost -certainly not be in Python 2.4. +however Guido was not convinced of their usefulness, and class +decorators were not in Python 2.4. `The issue was revisited`_ in March 2006 +and sufficient use-cases were found to justify the inclusion of class +decorators in Python 2.6. +.. _The issue was revisited: + http://mail.python.org/pipermail/python-dev/2006-March/062942.html + The discussion continued on and off on python-dev from February 2002 through July 2004. Hundreds and hundreds of posts were made, with people proposing many possible syntax variations. Guido took @@ -147,8 +176,8 @@ place. Subsequent to this, he decided that we'd have the `Java-style`_ @decorator syntax, and this appeared for the first time in 2.4a2. Barry Warsaw named this the 'pie-decorator' syntax, in honor of the -Pie-thon Parrot shootout which was occured around the same time as -the decorator syntax, and because the @ looks a little like a pie. +Pie-thon Parrot shootout which was occuring around the same time as +the decorator syntax debate, and because the @ looks a little like a pie. Guido `outlined his case`_ on Python-dev, including `this piece`_ on some of the (many) rejected forms. @@ -250,6 +279,19 @@ decorators are near the function declaration. The @ sign makes it clear that something new is going on here. +Python 2.6's class decorators work similarly:: + + @dec2 + @dec1 + class Foo: + pass + +This is equivalent to:: + + class Foo: + pass + Foo = dec2(dec1(Foo)) + The rationale for the `order of application`_ (bottom to top) is that it matches the usual order for function-application. In mathematics, composition of functions (g o f)(x) translates to g(f(x)). In Python, @@ -321,7 +363,7 @@ There have been a number of objections raised to this location -- the primary one is that it's the first real Python case where a line of code has an effect on a following line. The syntax available in 2.4a3 -requires one decorator per line (in a2, multiple decorators could be +requires one decorator per line (in 2.4a2, multiple decorators could be specified on the same line). People also complained that the syntax quickly got unwieldy when @@ -330,52 +372,61 @@ were small and thus this was not a large worry. Some of the advantages of this form are that the decorators live outside -the method body -- they are obviously executed at the time the function +the function/class body -- they are obviously executed at the time the object is defined. -Another advantage is that a prefix to the function definition fits +Another advantage is that a prefix to the definition fits the idea of knowing about a change to the semantics of the code before -the code itself, thus you know how to interpret the code's semantics +the code itself. This way, you know how to interpret the code's semantics properly without having to go back and change your initial perceptions if the syntax did not come before the function definition. Guido decided `he preferred`_ having the decorators on the line before -the 'def', because it was felt that a long argument list would mean that -the decorators would be 'hidden' +the 'def' or 'class', because it was felt that a long argument list would mean +that the decorators would be 'hidden' .. _he preferred: http://mail.python.org/pipermail/python-dev/2004-March/043756.html -The second form is the decorators between the def and the function name, -or the function name and the argument list:: +The second form is the decorators between the 'def' or 'class' and the object's +name, or between the name and the argument list:: def @classmethod foo(arg1,arg2): pass + + class @singleton Foo(arg1, arg2): + pass def @accepts(int,int), at returns(float) bar(low,high): pass def foo @classmethod (arg1,arg2): pass + + class Foo @singleton (arg1, arg2): + pass def bar @accepts(int,int), at returns(float) (low,high): pass There are a couple of objections to this form. The first is that it -breaks easily 'greppability' of the source -- you can no longer search +breaks easy 'greppability' of the source -- you can no longer search for 'def foo(' and find the definition of the function. The second, more serious, objection is that in the case of multiple decorators, the syntax would be extremely unwieldy. The next form, which has had a number of strong proponents, is to have the decorators between the argument list and the trailing ``:`` in the -'def' line:: +'def' or 'class' line:: def foo(arg1,arg2) @classmethod: pass def bar(low,high) @accepts(int,int), at returns(float): pass + + class Foo(object) @singleton: + pass Guido `summarized the arguments`_ against this form (many of which also apply to the previous form) as: @@ -403,15 +454,19 @@ @accepts(int,int) @returns(float) pass + + class Foo(object): + @singleton + pass The primary objection to this form is that it requires "peeking inside" -the method body to determine the decorators. In addition, even though -the code is inside the method body, it is not executed when the method +the suite body to determine the decorators. In addition, even though +the code is inside the suite body, it is not executed when the code is run. Guido felt that docstrings were not a good counter-example, and that it was quite possible that a 'docstring' decorator could help move the docstring to outside the function body. -The final form is a new block that encloses the method's code. For this +The final form is a new block that encloses the function or clas. For this example, we'll use a 'decorate' keyword, as it makes no sense with the @syntax. :: @@ -425,9 +480,14 @@ returns(float) def bar(low,high): pass + + decorate: + singleton + class Foo(object): + pass This form would result in inconsistent indentation for decorated and -undecorated methods. In addition, a decorated method's body would start +undecorated code. In addition, a decorated object's body would start three indent levels in. @@ -444,6 +504,10 @@ @returns(float) def bar(low,high): pass + + @singleton + class Foo(object): + pass The major objections against this syntax are that the @ symbol is not currently used in Python (and is used in both IPython and Leo), @@ -461,6 +525,10 @@ |returns(float) def bar(low,high): pass + + |singleton + class Foo(object): + pass This is a variant on the @decorator syntax -- it has the advantage that it does not break IPython and Leo. Its major disadvantage @@ -476,6 +544,10 @@ [accepts(int,int), returns(float)] def bar(low,high): pass + + [singleton] + class Foo(object): + pass The major objection to the list syntax is that it's currently meaningful (when used in the form before the method). It's also @@ -490,6 +562,10 @@ def bar(low,high): pass + + + class Foo(object): + pass None of these alternatives gained much traction. The alternatives which involve square brackets only serve to make it obvious that the @@ -659,7 +735,10 @@ .. _subsequently rejected: http://mail.python.org/pipermail/python-dev/2004-September/048518.html +For Python 2.6, the Python grammar and compiler were modified to allow +class decorators in addition to function and method decorators. + Community Consensus ------------------- From jimjjewett at gmail.com Tue May 1 17:54:24 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 1 May 2007 11:54:24 -0400 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com> References: <46371BD2.7050303@v.loewis.de> <43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com> Message-ID: On 5/1/07, Collin Winter wrote: > On 5/1/07, "Martin v. L?wis" wrote: > > Rationale > > ========= > That makes absolutely no sense. You mean to tell me that people write > Python without being able to understand any of the language's > keywords, builtin functions, standard library or documentation? If they have translations of the important documentation and the small number of keywords -- yes, they probably do; the alternative programming languages aren't really all that much easier for non-English speakers. FWIW, I've used undocumented variants of Assembler, based only on examples. (So no doc, didn't have a complete set of keywords/functions/libraries, misundstood some of what I did have.) I won't say it was a great environment, but it did work. -jJ From martin at v.loewis.de Tue May 1 17:56:10 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 01 May 2007 17:56:10 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: References: <46371BD2.7050303@v.loewis.de> Message-ID: <4637631A.6030702@v.loewis.de> >> Title: Supporting Non-ASCII Identifiers > > Isn't this already blacklisted in PEP 3099? It's not clear to me. That was in response to a suggestion that non-ASCII symbols will be used in the syntax of Python, i.e. in a way making it mandatory to be able to type these symbols. This is not the intent of this PEP. There is also http://mail.python.org/pipermail/python-3000/2006-April/001526.html where Guido states that he trusts me that it can be made to work, and that "eventually" it needs to be supported. Rather than asking for trust, I put out a specification of how precisely the change would be implemented. Then, in http://mail.python.org/pipermail/python-3000/2006-April/001551.html he indicates that this doesn't have to be synchronized with Py3k. So if it is rejected for Py3k because of PEP 3099, I will need to suggest it for addition to Python 2.6. However, if I had proposed it for Python 2.6, people would have objected that it should rather be included in Py3k. If it is rejected for 2.6 on the grounds of being premature, I will resubmit it for 3.1, and so on, until "eventually" is "now". If it gets rejected "for good", I shall feel sorry. Regards, Martin From martin at v.loewis.de Tue May 1 17:59:39 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 01 May 2007 17:59:39 +0200 Subject: [Python-3000] PEP: Drop Implicit String Concatentation In-Reply-To: <43aa6ff70705010822q1066ab51sa7c58547dd3d18f1@mail.gmail.com> References: <009e01c78bbd$da8b71c0$f001a8c0@RaymondLaptop1> <43aa6ff70705010822q1066ab51sa7c58547dd3d18f1@mail.gmail.com> Message-ID: <463763EB.3070400@v.loewis.de> Collin Winter schrieb: > On 4/30/07, Raymond Hettinger wrote: >> PEP: Remove Implicit String Concatenation > > Jim Jewett has already submitted a PEP that does this, PEP 3126. It's > in SVN but not showing up on PEP 0 for some reason: It does show on PEP 0: http://www.python.org/dev/peps/pep-0000/ however it does not show on the PEP index, for some reason. Andrew said he fixed that, it apparently still doesn't work. Martin From collinw at gmail.com Tue May 1 18:05:52 2007 From: collinw at gmail.com (Collin Winter) Date: Tue, 1 May 2007 09:05:52 -0700 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <4637631A.6030702@v.loewis.de> References: <46371BD2.7050303@v.loewis.de> <4637631A.6030702@v.loewis.de> Message-ID: <43aa6ff70705010905l3f87d57ck5a8f5597a6de9dab@mail.gmail.com> On 5/1/07, "Martin v. L?wis" wrote: > >> Title: Supporting Non-ASCII Identifiers > > > > Isn't this already blacklisted in PEP 3099? > > It's not clear to me. That was in response to a suggestion > that non-ASCII symbols will be used in the syntax of Python, > i.e. in a way making it mandatory to be able to type these > symbols. Reading from http://mail.python.org/pipermail/python-3000/2006-April/001474.html, the message that prompted this particular addition to PEP 3099, "I want good Unicode support for string literals and comments. Everything else in the language ought to be ASCII." Identifiers aren't string literals or comments. Collin Winter From pje at telecommunity.com Tue May 1 18:09:30 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 12:09:30 -0400 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> Message-ID: <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> At 12:31 AM 5/1/2007 -0700, Raymond Hettinger wrote: >The alternative is to code the automatic finalization steps using >weakref callbacks. For those used to using __del__, it takes a little >while to learn the idiom but essentially the technique is hold a proxy >or ref with a callback to a boundmethod for finalization: > self.resource = resource = CreateResource() > self.callbacks.append(proxy(resource, resource.closedown)) >In this manner, all of the object's resources can be freed automatically >when the object is collected. Note, that the callbacks only bind >the resource object and not client object, so the client object >can already have been collected and the teardown code can be run >without risk of resurrecting the client (with a possibly invalid state). I'm a bit confused about the above. My understanding is that in order for a weakref's callback to be invoked, the weakref itself *must still be live*. That means that if 'self' in your example above is collected, then the weakref no longer exists, so the closedown won't be called. Yet, at the same time, it appears that in your example, deleting self.resource would *not* cause resource to be GC'd either, because the weakref still holds a reference to 'resource.closedown', which in turn must hold a reference to 'resource'. So, at first glance, your example looks like it can't possibly do the right thing, ever, unless I'm missing something rather big. In which case, the explanation for *how* this is supposed to work should go in the PEP. In principle I'm in favor of ditching __del__, as long as there's actually a viable technique for doing so. My own experience has been that setting up a simple mechanism to replace it (and that actually works) is really difficult, because you have to find some place for the weakref itself to live, which usually means a global dictionary or something of that sort. It would be nice if the gc or weakref modules grew a facility to make it easier to register finalization callbacks, and could optionally check whether you were registering a callback that referenced the thing you were tying the callback's life to. From pje at telecommunity.com Tue May 1 18:14:19 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 12:14:19 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430200255.04b88e10@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430200255.04b88e10@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070501121143.02d31398@sparrow.telecommunity.com> At 10:11 AM 5/1/2007 -0400, Jim Jewett wrote: >On 4/30/07, Phillip J. Eby wrote: >>At 07:29 PM 4/30/2007 -0400, Jim Jewett wrote: >> >On 4/30/07, Phillip J. Eby wrote: > >> >>PEP 3115, however, requires that a class' metaclass be determined >> >>*before* the class body has executed, making it impossible to use this >> >>technique for class decoration any more. > >> >It doesn't say what that metaclass has to do, though. > >> >Is there any reason the metaclass couldn't delegate differently >> >depending on the value of __my_magic_attribute__ ? > >>Sure -- that's what I suggested in the "super(), class decorators, and PEP >>3115" thread, but Guido voted -1 on adding such a magic attribute to PEP >>3115. > >I don't think we're understanding each other. Yup, and we're still not now. :) Or at least, I don't understand what the code below does, or more precisely, why it's different from just having a __decorators__ list containing direct callbacks. The extra indirection of having an "after hooks" registry and separate attributes doesn't appear to add anything, although if it turned out you really needed it, you could just add a callback to __decorators__ that did it. > Why couldn't you use a >suitably fleshed-out version of: > >class _ConditionalMetaclass(type): > > def __init__(cls, name, bases, dct): > super(_ConditionalMetaclass, cls).__init__(name, bases, dct) > hooks = [(k, v) for (k, v) in dct.items() if >k.startswith("_afterhook_")] > for k, v in hooks: > cls = AfterHooksRegistry[k](cls, v) > >-jJ From martin at v.loewis.de Tue May 1 18:14:10 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 01 May 2007 18:14:10 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com> References: <46371BD2.7050303@v.loewis.de> <43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com> Message-ID: <46376752.2070007@v.loewis.de> >> Python code is written by many people in the world who are not >> familiar with the English language, or even well-acquainted with the >> Latin writing system. > [snip] > > That makes absolutely no sense. You mean to tell me that people write > Python without being able to understand any of the language's > keywords, builtin functions, standard library or documentation? Exactly so. They have natural-language of the documentation, by means of books and literal translation of the Python documentation, and they don't try to grasp the meaning of the identifiers (e.g. I only yesterday learned what a "hub" is, as in "hub-and-spoke". I accepted it to mean "networking device that forwards packets" before. Many people around here think that ASCII is pronounced A-S-C-two, i.e. II stands for a Roman numeral - and these people did have some English training.) I still don't understand why the "no operation" statement is called "pass" - it's not the opposite of "fail", and seems to have no relationship to "can you pass me the butter, please?". The point is that even though many people get some passive knowledge of English over time, they have a hard time with active usage of the language. So when they need to come up with identifiers and put comments into the code, they use their first language. See the comments for PEP 328 in http://python.com.ua/ru/news/2006/09/20/nakonets-to-vyishel-python-25/ (I'm sure I can also find code with transliterated identifiers in the net, but finding that is bit more tedious, so I would prefer if you trust me on that). Regards, Martin From jjb5 at cornell.edu Tue May 1 18:16:08 2007 From: jjb5 at cornell.edu (Joel Bender) Date: Tue, 01 May 2007 12:16:08 -0400 Subject: [Python-3000] Breakthrough in thinking about ABCs (PEPs 3119 and 3141) In-Reply-To: <5.1.1.6.0.20070430205955.04953100@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430205955.04953100@sparrow.telecommunity.com> Message-ID: <463767C8.5070608@cornell.edu> Phillip J. Eby wrote: >> Personally, I still think that the most uniform way of spelling this >> is overloading isinstance and issubclass; that has the highest >> likelihood of standardizing the spelling for such inquiries. > > A big +1 here. This is no different than e.g. operator.mul() being able to > do different things depending on the second argument. n00b here, trying to follow this... class X: def __mul__(self, y): print self, "mul", y def __rmul__(self, y): print self, "rmul", y Treating isinstance like operator.mul, I could do this (and I would expect that you want to make it a class method)... class Y: @classmethod def __risinstance__(cls, obj): print obj, "is instance of", cls So issubclass(D, C) would call D.__issubclass__(C) or C.__rissubclass__(D) and leave it up to the programmer. The former is "somebody is checking to see if I inherit some functionality" and the latter is "somebody is checking to see if something is a proper derived class of me". class A(object): @classmethod def __rissubclass__(cls, subcls): if not object.__rissubclass__(cls, subcls): return False return subcls.f is not A.f def f(self): raise RuntimeError, "f must be overridden" class B(A): def g(self): print "B.g" class C(A): def f(self): print "C.f" Now my testing can check issubclass(B, A) and it will fail because B.f hasn't been provided, but issubclass(C, A) passes. I don't have to call B().f() and have it fail, it might be expensive to create a B(). Joel From martin at v.loewis.de Tue May 1 18:19:02 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 01 May 2007 18:19:02 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <43aa6ff70705010905l3f87d57ck5a8f5597a6de9dab@mail.gmail.com> References: <46371BD2.7050303@v.loewis.de> <4637631A.6030702@v.loewis.de> <43aa6ff70705010905l3f87d57ck5a8f5597a6de9dab@mail.gmail.com> Message-ID: <46376876.1010803@v.loewis.de> > Reading from > http://mail.python.org/pipermail/python-3000/2006-April/001474.html, > the message that prompted this particular addition to PEP 3099, "I > want good Unicode support for string literals and comments. Everything > else in the language ought to be ASCII." > > Identifiers aren't string literals or comments. Sure, but please follow the follow-up communication also. If this is to be rejected, I'd rather get a PEP number and an explicit rejection, instead of having to guess. Regards, Martin From jimjjewett at gmail.com Tue May 1 18:19:43 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 1 May 2007 12:19:43 -0400 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <46371BD2.7050303@v.loewis.de> References: <46371BD2.7050303@v.loewis.de> Message-ID: On 5/1/07, "Martin v. L?wis" wrote: > The identifier syntax is \*. > ID_Start is defined as all characters having one of the general > categories uppercase letters (Lu), lowercase letters (Ll), titlecase > letters (Lt), modifier letters (Lm), other letters (Lo), letter > numbers (Nl), plus the underscore (XXX what are "stability extensions > listed in UAX 31). Are you sure that modifier letters should be included? The standard says so, but as nearly as I can tell, these are really more like diacritics -- and some of them look an awful lot like punctuation. http://unicode.org/charts/PDF/U02B0.pdf -jJ From talin at acm.org Tue May 1 18:13:29 2007 From: talin at acm.org (Talin) Date: Tue, 01 May 2007 09:13:29 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: <46376729.9000008@acm.org> Phillip J. Eby wrote: > Proceeding to the "Next" Method > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > If the first parameter of an overloaded function is named > ``__proceed__``, it will be passed a callable representing the next > most-specific method. For example, this code:: > > def foo(bar:object, baz:object): > print "got objects!" > > @overload > def foo(__proceed__, bar:int, baz:int): > print "got integers!" > return __proceed__(bar, baz) I don't care for the idea of testing against a specially named argument. Why couldn't you just have a different decorator, such as "overload_chained" which triggers this behavior? -- Talin From martin at v.loewis.de Tue May 1 18:24:49 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 01 May 2007 18:24:49 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: References: <46371BD2.7050303@v.loewis.de> Message-ID: <463769D1.5020505@v.loewis.de> Jim Jewett schrieb: > On 5/1/07, "Martin v. L?wis" wrote: > >> The identifier syntax is \*. > >> ID_Start is defined as all characters having one of the general >> categories uppercase letters (Lu), lowercase letters (Ll), titlecase >> letters (Lt), modifier letters (Lm), other letters (Lo), letter >> numbers (Nl), plus the underscore (XXX what are "stability extensions >> listed in UAX 31). > > Are you sure that modifier letters should be included? The standard > says so, but as nearly as I can tell, these are really more like > diacritics -- and some of them look an awful lot like punctuation. Interesting question. I included them because the standard says so, but I don't see an inherent need. I'll see whether I can find some rationale as to why they were included in UAX 31, and then check whether that rationale applies to Python as well. Regards, Martin From pje at telecommunity.com Tue May 1 18:27:50 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 12:27:50 -0400 Subject: [Python-3000] Why isinstance() and issubclass() don't need to be unforgeable Message-ID: <5.1.1.6.0.20070501121527.02f47f90@sparrow.telecommunity.com> I just wanted to throw in a note for those who are upset with the idea that classes should be able to decide how isinstance() and issubclass() work. If you want "true, unforgeable" isinstance and subclass, you can still use these formulas: def true_issubclass(C1, C2): return C2 in type.__mro__.__get__(C1) def isinstance_no_proxy(o, C): return true_issubclass(type(o), C) def isinstance_with_proxy(o, C): cls = getattr(o, '__class__', None) return true_issubclass(cls, C) or isinstance_no_proxy(o, C) Their complexity reflects the fact that they rely on implementation details which the vast majority of code should not care about. So, if you really have a need to find out whether something is truly an instance of something for *structural* reasons, you will still be able to do that. Yes, it will be a pain. But deliberately inducing structural dependencies *should* be painful, because you're making it painful for the *users* of your code, whenever you impose isinstance/issubclass checks beyond necessity. The fact that it's currently *not* painful, is precisely what makes it such a good idea to add the new hooks to make these operations forgeable. The default, in other words, should not be to care about what objects *are*, only what they *claim* to be. From l.mastrodomenico at gmail.com Tue May 1 18:27:30 2007 From: l.mastrodomenico at gmail.com (Lino Mastrodomenico) Date: Tue, 1 May 2007 18:27:30 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <46376752.2070007@v.loewis.de> References: <46371BD2.7050303@v.loewis.de> <43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com> <46376752.2070007@v.loewis.de> Message-ID: 2007/5/1, "Martin v. L?wis" : > The point is that even though many people get some passive knowledge > of English over time, they have a hard time with active usage of the > language. So when they need to come up with identifiers and put comments > into the code, they use their first language. See the comments for PEP > 328 in > > http://python.com.ua/ru/news/2006/09/20/nakonets-to-vyishel-python-25/ > > (I'm sure I can also find code with transliterated identifiers in the > net, but finding that is bit more tedious, so I would prefer if > you trust me on that). If this can help the discussion, the first example of Python code in the Italian translation of the tutorial is: >>> il_mondo_?_piatto = 1 >>> if il_mondo_?_piatto: ... print "Occhio a non caderne fuori!" ... Occhio a non caderne fuori! http://python.it/doc/Python-Docs/html/tut/node4.html Please note the "?" character in the variable name. And yes, this code used to work out of the box (AFAIK at least until Python 2.2). -- Lino Mastrodomenico E-mail: l.mastrodomenico at gmail.com From pje at telecommunity.com Tue May 1 18:36:17 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 12:36:17 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <46376729.9000008@acm.org> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> At 09:13 AM 5/1/2007 -0700, Talin wrote: >Phillip J. Eby wrote: >>Proceeding to the "Next" Method >>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>If the first parameter of an overloaded function is named >>``__proceed__``, it will be passed a callable representing the next >>most-specific method. For example, this code:: >> def foo(bar:object, baz:object): >> print "got objects!" >> @overload >> def foo(__proceed__, bar:int, baz:int): >> print "got integers!" >> return __proceed__(bar, baz) > >I don't care for the idea of testing against a specially named argument. >Why couldn't you just have a different decorator, such as >"overload_chained" which triggers this behavior? The PEP lists *five* built-in decorators, all of which support this behavior:: @overload, @when, @before, @after, @around And in addition, it demonstrates how to create *new* method combination decorators, that *also* support this behavior (e.g. '@discount'). All in all, there are an unbounded number of possible decorators that would require chained and non-chained variations. The other alternative would be to have a "magic" function like "get_next_method()" that you could call, but the setup for such an animal is more complex and would likely involve either sys._getframe() or some kind of special thread variable(s). Performance would also be reduced for *all* generic function invocations, because the setup would have to occur whether or not chaining was happening. The argument list technique allows the overhead to happen only once, and only when it's needed. One new possibility, however... suppose we did it like this: from overloading import next_method @overload def foo(blah:next_method, ...): That is, if we used an argument *annotation* to designate the argument that would receive the next method? For efficiency's sake, it would still need to be the first argument, but at least the special name would go away, and you could call it whatever you like. From janssen at parc.com Tue May 1 18:42:43 2007 From: janssen at parc.com (Bill Janssen) Date: Tue, 1 May 2007 09:42:43 PDT Subject: [Python-3000] Traits/roles instead of ABCs In-Reply-To: References: <43aa6ff70704291840s3384824et44ebfd360c15eda@mail.gmail.com> <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> <1d36917a0704300816ma3bf9c2o4dd674cfcefa9172@mail.gmail.com> <-3456230403858254882@unknownmsgid> <740c3aec0704301501u7df7b5a6uaea854d4716eb87e@mail.gmail.com> Message-ID: <07May1.094247pdt."57996"@synergy1.parc.xerox.com> > To me, interfaces and/or generic functions strike the right balance. I agree. As I've said before, if this was 1994 I think I'd be in the PJE camp and prefer generic functions. As it is, I think interfaces better fit the current state of Python. And I think the existing type system has everything that's needed to indicate interfaces. All we need are some base definitions to stand on (dict, sequence, file, etc.). > Such tools are completely invisible for Python programmers who don't =20 > care about them (the vast majority). They're also essential for a =20 > very small subclass of very important Python applications. Yep, those of us who write very large Python applications. > If ABCs can walk that same tightrope of utility and invisibility, =20 > then maybe they'll successfully fill that niche. I'm sure they will. Bill From pje at telecommunity.com Tue May 1 18:44:49 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 12:44:49 -0400 Subject: [Python-3000] Breakthrough in thinking about ABCs (PEPs 3119 and 3141) In-Reply-To: <463767C8.5070608@cornell.edu> References: <5.1.1.6.0.20070430205955.04953100@sparrow.telecommunity.com> <5.1.1.6.0.20070430205955.04953100@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070501124243.068aa3a0@sparrow.telecommunity.com> At 12:16 PM 5/1/2007 -0400, Joel Bender wrote: >So issubclass(D, C) would call D.__issubclass__(C) or >C.__rissubclass__(D) and leave it up to the programmer. Yes, except there's only the '__r__' versions and they're not called that. > The former is >"somebody is checking to see if I inherit some functionality" and the >latter is "somebody is checking to see if something is a proper derived >class of me". > > class A(object): > @classmethod > def __rissubclass__(cls, subcls): > if not object.__rissubclass__(cls, subcls): > return False > return subcls.f is not A.f > > def f(self): > raise RuntimeError, "f must be overridden" > > class B(A): > def g(self): print "B.g" > > class C(A): > def f(self): print "C.f" > >Now my testing can check issubclass(B, A) and it will fail because B.f >hasn't been provided, but issubclass(C, A) passes. I don't have to call >B().f() and have it fail, it might be expensive to create a B(). Right; you've just pointed out something important that hasn't been stated until now. The objections to this extension have focused on the idea that this makes type checking less strict, but you've just demonstrated that it can actually be used to make it *more* strict. I hadn't thought of that. From pje at telecommunity.com Tue May 1 18:45:43 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 12:45:43 -0400 Subject: [Python-3000] Derivation of "pass" in Python (was Re: PEP: Supporting Non-ASCII Identifiers) In-Reply-To: <46376752.2070007@v.loewis.de> References: <43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com> <46371BD2.7050303@v.loewis.de> <43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com> Message-ID: <5.1.1.6.0.20070501123738.05263610@sparrow.telecommunity.com> At 06:14 PM 5/1/2007 +0200, Martin v. L??wis wrote: >I still don't understand why the "no operation" statement is called >"pass" - it's not the opposite of "fail", and seems to have no >relationship to "can you pass me the butter, please?". Actually, it does, in the sense that to "pass" on something means to give up the chance to take it. So, if butter is being passed around the dinner table, one who chooses not to take it, but passes it on to the next person, is said to be "passing on" (i.e. conceding the opportunity). Thus, when someone is offered something, they may say, "I'll pass", meaning they are declining to act. Ergo, to "pass" in Python is to decline to give up the opportunity to act. From guido at python.org Tue May 1 18:48:43 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 09:48:43 -0700 Subject: [Python-3000] Addition to PEP 3101 In-Reply-To: References: <8f01efd00704300953t6154d7e1j7ef18cead1acb344@mail.gmail.com> <46368EE5.6050409@canterbury.ac.nz> <4636AE9E.2020905@acm.org> Message-ID: On 5/1/07, Jim Jewett wrote: > On 4/30/07, Talin wrote: > > Greg Ewing wrote: > > > Patrick Maupin wrote: > > > >> Method calls are deliberately disallowed by the PEP, so that the > > >> implementation has some hope of being securable. > > > > If attribute access is allowed, arbitrary code can already > > > be triggered, so I don't see how this makes a difference > > > to security. > > > Not quite. It depends on what you mean by 'arbitrary code'. ... > > If I understood that correctly, then > > (1) The format string cannot run arbitrary code, but > (2) The formatted objects themselves can. > > This is probably a feature, since you can pass proxy objects, but it > should definately be called out explicitly in the security section > (currently just some text in Simple and Compound Names section). > Example Text: > > > Note that while (literal strings used as) format strings are > effectively sandboxed, the formatted objects themselves are not. > > "My name is {0[name]}".format(evil_map) > > would still allow evil_map to run arbitrary code. And how on earth would that be a security threat? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Tue May 1 18:48:47 2007 From: janssen at parc.com (Bill Janssen) Date: Tue, 1 May 2007 09:48:47 PDT Subject: [Python-3000] Traits/roles instead of ABCs In-Reply-To: <463690E5.6060603@canterbury.ac.nz> References: <43aa6ff70704291840s3384824et44ebfd360c15eda@mail.gmail.com> <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> <438708814690534630@unknownmsgid> <79990c6b0704301001ga0d2429sdaded9ac75fa15c5@mail.gmail.com> <07Apr30.141916pdt.57996@synergy1.parc.xerox.com> <463690E5.6060603@canterbury.ac.nz> Message-ID: <07May1.094853pdt."57996"@synergy1.parc.xerox.com> Greg Ewing writes: > But I don't think there is any such definition, and > the confusion arises because people lazily use the > vague term "file-like" instead of spelling out what > they really mean ("has a read() method", etc.) Yes, I agree with this. That's why http://wiki.python.org/moin/AbstractBaseClasses?highlight=%28AbstractBaseClasses%29 splits the "file-like" interface into a number of composable pieces. Bill From martin at v.loewis.de Tue May 1 18:54:49 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 01 May 2007 18:54:49 +0200 Subject: [Python-3000] Derivation of "pass" in Python (was Re: PEP: Supporting Non-ASCII Identifiers) In-Reply-To: <5.1.1.6.0.20070501123738.05263610@sparrow.telecommunity.com> References: <43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com> <46371BD2.7050303@v.loewis.de> <43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com> <5.1.1.6.0.20070501123738.05263610@sparrow.telecommunity.com> Message-ID: <463770D9.3050405@v.loewis.de> > Thus, when someone is offered something, they may say, "I'll pass", > meaning they are declining to act. Ergo, to "pass" in Python is to > decline to give up the opportunity to act. Ah, ok. It would then be similar to "Passe!" in German, which is used in card games, if you don't play a card, but instead hand over to the next player. Even though this is clearly the same ancestry, it never occurred to me that the same meaning is also present in English (also, "passen" is somewhat oldish now, so I don't use it actively myself). Regards, Martin From janssen at parc.com Tue May 1 18:54:38 2007 From: janssen at parc.com (Bill Janssen) Date: Tue, 1 May 2007 09:54:38 PDT Subject: [Python-3000] Traits/roles instead of ABCs In-Reply-To: <1d36917a0704302031hd34ffcfu2eee879aef426931@mail.gmail.com> References: <43aa6ff70704291840s3384824et44ebfd360c15eda@mail.gmail.com> <014201c78adc$ca70d960$f101a8c0@RaymondLaptop1> <1d36917a0704300816ma3bf9c2o4dd674cfcefa9172@mail.gmail.com> <-3456230403858254882@unknownmsgid> <740c3aec0704301501u7df7b5a6uaea854d4716eb87e@mail.gmail.com> <1d36917a0704302031hd34ffcfu2eee879aef426931@mail.gmail.com> Message-ID: <07May1.095448pdt."57996"@synergy1.parc.xerox.com> Alan McIntyre writes: > I have a nagging concern that these additions will > clutter up the core, and--no matter how hard you try--adding them is > going to have an impact on "run-of-the-mill" users of the language. I don't think that will be the case, if we just use ABCs. There will be a definition somewhere of the basic type APIs, but the normal user will still just say "f=open(FILENAME)" or "d={}" without really caring what APIs the value returned by "open" or "{}" support -- the current informal and fuzzy understanding of the type will suffice, just as it does now. Those who do care, however, will be able to find out, and use that information. Bill From guido at python.org Tue May 1 18:58:03 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 09:58:03 -0700 Subject: [Python-3000] BList PEP In-Reply-To: References: Message-ID: On 5/1/07, Daniel Stutzbach wrote: > PEP: 30XX > Title: BList: A faster list-like type Checked in as PEP 3128. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fumanchu at amor.org Tue May 1 18:58:32 2007 From: fumanchu at amor.org (Robert Brewer) Date: Tue, 1 May 2007 09:58:32 -0700 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <46371BD2.7050303@v.loewis.de> Message-ID: <435DF58A933BA74397B42CDEB8145A860B745DEF@ex9.hostedexchange.local> Martin v. L?wis wrote: > Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers > > Common Objections > ================= > > People claim that they will not be able to use a library if to do so > they have to use characters they cannot type on their > keyboards. However, it is the choice of the designer of the library to > decide on various constraints for using the library: people may not be > able to use the library because they cannot get physical access to the > source code (because it is not published), or because licensing > prohibits usage, or because the documentation is in a language they > cannot understand. A developer wishing to make a library widely > available needs to make a number of explicit choices (such as > publication, licensing, language of documentation, and language of > identifiers). It should always be the choice of the author to make > these decisions - not the choice of the language designers. That seems true when each such decision is considered in isolation. But the language designers are responsible to make sure the number of such explicit decisions/choices does not grow beyond a reasonable limit. Robert Brewer System Architect Amor Ministries fumanchu at amor.org From guido at python.org Tue May 1 19:07:31 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 10:07:31 -0700 Subject: [Python-3000] Why isinstance() and issubclass() don't need to be unforgeable In-Reply-To: <5.1.1.6.0.20070501121527.02f47f90@sparrow.telecommunity.com> References: <5.1.1.6.0.20070501121527.02f47f90@sparrow.telecommunity.com> Message-ID: On 5/1/07, Phillip J. Eby wrote: > I just wanted to throw in a note for those who are upset with the idea that > classes should be able to decide how isinstance() and issubclass() > work. If you want "true, unforgeable" isinstance and subclass, you can > still use these formulas: > > > def true_issubclass(C1, C2): > return C2 in type.__mro__.__get__(C1) > > def isinstance_no_proxy(o, C): > return true_issubclass(type(o), C) > > def isinstance_with_proxy(o, C): > cls = getattr(o, '__class__', None) > return true_issubclass(cls, C) or isinstance_no_proxy(o, C) > > > Their complexity reflects the fact that they rely on implementation details > which the vast majority of code should not care about. > > So, if you really have a need to find out whether something is truly an > instance of something for *structural* reasons, you will still be able to > do that. Yes, it will be a pain. But deliberately inducing structural > dependencies *should* be painful, because you're making it painful for the > *users* of your code, whenever you impose isinstance/issubclass checks > beyond necessity. > > The fact that it's currently *not* painful, is precisely what makes it such > a good idea to add the new hooks to make these operations forgeable. > > The default, in other words, should not be to care about what objects > *are*, only what they *claim* to be. (Or what is claimed about them!) Thanks for writing this note! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 1 19:17:25 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 10:17:25 -0700 Subject: [Python-3000] PEP index out of date, and work-around Message-ID: There seems to be an issue with the PEP index: http://python.org/dev/peps/ lists PEP 3122 as the last PEP (not counting PEP 3141 which is deliberately out of sequence). As a work-around, an up to date index is here: http://python.org/dev/peps/pep-0000/ PEPs 3123-3128 are alive and well and reachable via this index. One of the webmasters will look into this tonight. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 1 19:19:21 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 10:19:21 -0700 Subject: [Python-3000] Adding class decorators to PEP 318 In-Reply-To: <43aa6ff70705010852g112924a2hbf13f31d83631a85@mail.gmail.com> References: <43aa6ff70705010852g112924a2hbf13f31d83631a85@mail.gmail.com> Message-ID: I don't like this -- it seems like rewriting history to me. I'd rather leave PEP 318 alone and create a new PEP. Of course the new PEP can be short because it can refer to PEP 318. --Guido On 5/1/07, Collin Winter wrote: > In talking to Neal Norwitz about this, I don't see a need for a > separate PEP for class decorators; we already have a decorators PEP, > #318. The following is a proposed patch to PEP 318 that adds in class > decorators. > > Collin Winter > > > Index: pep-0318.txt > =================================================================== > --- pep-0318.txt (revision 55034) > +++ pep-0318.txt (working copy) > @@ -1,5 +1,5 @@ > PEP: 318 > -Title: Decorators for Functions and Methods > +Title: Decorators for Functions, Methods and Classes > Version: $Revision$ > Last-Modified: $Date$ > Author: Kevin D. Smith, Jim Jewett, Skip Montanaro, Anthony Baxter > @@ -9,7 +9,7 @@ > Created: 05-Jun-2003 > Python-Version: 2.4 > Post-History: 09-Jun-2003, 10-Jun-2003, 27-Feb-2004, 23-Mar-2004, 30-Aug-2004, > - 2-Sep-2004 > + 2-Sep-2004, 30-Apr-2007 > > > WarningWarningWarning > @@ -22,24 +22,40 @@ > negatives of each form. > > > +UpdateUpdateUpdate > +================== > + > +In April 2007, this PEP was updated to reflect the evolution of the Python > +community's attitude toward class decorators. Though they had previously > +been rejected as too obscure and with limited use-cases, by mid-2006, > +class decorators had come to be seen as the logical next step, with some > +wondering why they hadn't been included originally. As a result, class > +decorators will ship in Python 2.6. > + > +This PEP has been modified accordingly, with references to class decorators > +injected into the narrative. While some references to the lack of class > +decorators have been left in place to preserve the historical record, others > +have been removed for the sake of coherence. > + > + > Abstract > ======== > > -The current method for transforming functions and methods (for instance, > -declaring them as a class or static method) is awkward and can lead to > -code that is difficult to understand. Ideally, these transformations > -should be made at the same point in the code where the declaration > -itself is made. This PEP introduces new syntax for transformations of a > -function or method declaration. > +The current method for transforming functions, methods and classes (for > +instance, declaring a method as a class or static method) is awkward and > +can lead to code that is difficult to understand. Ideally, these > +transformations should be made at the same point in the code where the > +declaration itself is made. This PEP introduces new syntax for > +transformations of a function, method or class declaration. > > > Motivation > ========== > > -The current method of applying a transformation to a function or method > -places the actual transformation after the function body. For large > -functions this separates a key component of the function's behavior from > -the definition of the rest of the function's external interface. For > +The current method of applying a transformation to a function, method or class > +places the actual transformation after the body. For large > +code blocks this separates a key component of the object's behavior from > +the definition of the rest of the object's external interface. For > example:: > > def foo(self): > @@ -69,14 +85,22 @@ > are not as immediately apparent. Almost certainly, anything which could > be done with class decorators could be done using metaclasses, but > using metaclasses is sufficiently obscure that there is some attraction > -to having an easier way to make simple modifications to classes. For > -Python 2.4, only function/method decorators are being added. > +to having an easier way to make simple modifications to classes. The > +following is much clearer than the metaclass-based alternative:: > > + @singleton > + class Foo(object): > + pass > > +Because of the greater ease-of-use of class decorators and the symmetry > +with function and method decorators, class decorators will be included in > +Python 2.6. > + > + > Why Is This So Hard? > -------------------- > > -Two decorators (``classmethod()`` and ``staticmethod()``) have been > +Two method decorators (``classmethod()`` and ``staticmethod()``) have been > available in Python since version 2.2. It's been assumed since > approximately that time that some syntactic support for them would > eventually be added to the language. Given this assumption, one might > @@ -135,11 +159,16 @@ > .. _gareth mccaughan: > http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=slrna40k88.2h9o.Gareth.McCaughan%40g.local > > -Class decorations seem like an obvious next step because class > +Class decorations seemed like an obvious next step because class > definition and function definition are syntactically similar, > -however Guido remains unconvinced, and class decorators will almost > -certainly not be in Python 2.4. > +however Guido was not convinced of their usefulness, and class > +decorators were not in Python 2.4. `The issue was revisited`_ in March 2006 > +and sufficient use-cases were found to justify the inclusion of class > +decorators in Python 2.6. > > +.. _The issue was revisited: > + http://mail.python.org/pipermail/python-dev/2006-March/062942.html > + > The discussion continued on and off on python-dev from February > 2002 through July 2004. Hundreds and hundreds of posts were made, > with people proposing many possible syntax variations. Guido took > @@ -147,8 +176,8 @@ > place. Subsequent to this, he decided that we'd have the `Java-style`_ > @decorator syntax, and this appeared for the first time in 2.4a2. > Barry Warsaw named this the 'pie-decorator' syntax, in honor of the > -Pie-thon Parrot shootout which was occured around the same time as > -the decorator syntax, and because the @ looks a little like a pie. > +Pie-thon Parrot shootout which was occuring around the same time as > +the decorator syntax debate, and because the @ looks a little like a pie. > Guido `outlined his case`_ on Python-dev, including `this piece`_ > on some of the (many) rejected forms. > > @@ -250,6 +279,19 @@ > decorators are near the function declaration. The @ sign makes it clear > that something new is going on here. > > +Python 2.6's class decorators work similarly:: > + > + @dec2 > + @dec1 > + class Foo: > + pass > + > +This is equivalent to:: > + > + class Foo: > + pass > + Foo = dec2(dec1(Foo)) > + > The rationale for the `order of application`_ (bottom to top) is that it > matches the usual order for function-application. In mathematics, > composition of functions (g o f)(x) translates to g(f(x)). In Python, > @@ -321,7 +363,7 @@ > There have been a number of objections raised to this location -- the > primary one is that it's the first real Python case where a line of code > has an effect on a following line. The syntax available in 2.4a3 > -requires one decorator per line (in a2, multiple decorators could be > +requires one decorator per line (in 2.4a2, multiple decorators could be > specified on the same line). > > People also complained that the syntax quickly got unwieldy when > @@ -330,52 +372,61 @@ > were small and thus this was not a large worry. > > Some of the advantages of this form are that the decorators live outside > -the method body -- they are obviously executed at the time the function > +the function/class body -- they are obviously executed at the time the object > is defined. > > -Another advantage is that a prefix to the function definition fits > +Another advantage is that a prefix to the definition fits > the idea of knowing about a change to the semantics of the code before > -the code itself, thus you know how to interpret the code's semantics > +the code itself. This way, you know how to interpret the code's semantics > properly without having to go back and change your initial perceptions > if the syntax did not come before the function definition. > > Guido decided `he preferred`_ having the decorators on the line before > -the 'def', because it was felt that a long argument list would mean that > -the decorators would be 'hidden' > +the 'def' or 'class', because it was felt that a long argument list would mean > +that the decorators would be 'hidden' > > .. _he preferred: > http://mail.python.org/pipermail/python-dev/2004-March/043756.html > > -The second form is the decorators between the def and the function name, > -or the function name and the argument list:: > +The second form is the decorators between the 'def' or 'class' and the object's > +name, or between the name and the argument list:: > > def @classmethod foo(arg1,arg2): > pass > + > + class @singleton Foo(arg1, arg2): > + pass > > def @accepts(int,int), at returns(float) bar(low,high): > pass > > def foo @classmethod (arg1,arg2): > pass > + > + class Foo @singleton (arg1, arg2): > + pass > > def bar @accepts(int,int), at returns(float) (low,high): > pass > > There are a couple of objections to this form. The first is that it > -breaks easily 'greppability' of the source -- you can no longer search > +breaks easy 'greppability' of the source -- you can no longer search > for 'def foo(' and find the definition of the function. The second, > more serious, objection is that in the case of multiple decorators, the > syntax would be extremely unwieldy. > > The next form, which has had a number of strong proponents, is to have > the decorators between the argument list and the trailing ``:`` in the > -'def' line:: > +'def' or 'class' line:: > > def foo(arg1,arg2) @classmethod: > pass > > def bar(low,high) @accepts(int,int), at returns(float): > pass > + > + class Foo(object) @singleton: > + pass > > Guido `summarized the arguments`_ against this form (many of which also > apply to the previous form) as: > @@ -403,15 +454,19 @@ > @accepts(int,int) > @returns(float) > pass > + > + class Foo(object): > + @singleton > + pass > > The primary objection to this form is that it requires "peeking inside" > -the method body to determine the decorators. In addition, even though > -the code is inside the method body, it is not executed when the method > +the suite body to determine the decorators. In addition, even though > +the code is inside the suite body, it is not executed when the code > is run. Guido felt that docstrings were not a good counter-example, and > that it was quite possible that a 'docstring' decorator could help move > the docstring to outside the function body. > > -The final form is a new block that encloses the method's code. For this > +The final form is a new block that encloses the function or clas. For this > example, we'll use a 'decorate' keyword, as it makes no sense with the > @syntax. :: > > @@ -425,9 +480,14 @@ > returns(float) > def bar(low,high): > pass > + > + decorate: > + singleton > + class Foo(object): > + pass > > This form would result in inconsistent indentation for decorated and > -undecorated methods. In addition, a decorated method's body would start > +undecorated code. In addition, a decorated object's body would start > three indent levels in. > > > @@ -444,6 +504,10 @@ > @returns(float) > def bar(low,high): > pass > + > + @singleton > + class Foo(object): > + pass > > The major objections against this syntax are that the @ symbol is > not currently used in Python (and is used in both IPython and Leo), > @@ -461,6 +525,10 @@ > |returns(float) > def bar(low,high): > pass > + > + |singleton > + class Foo(object): > + pass > > This is a variant on the @decorator syntax -- it has the advantage > that it does not break IPython and Leo. Its major disadvantage > @@ -476,6 +544,10 @@ > [accepts(int,int), returns(float)] > def bar(low,high): > pass > + > + [singleton] > + class Foo(object): > + pass > > The major objection to the list syntax is that it's currently > meaningful (when used in the form before the method). It's also > @@ -490,6 +562,10 @@ > > def bar(low,high): > pass > + > + > + class Foo(object): > + pass > > None of these alternatives gained much traction. The alternatives > which involve square brackets only serve to make it obvious that the > @@ -659,7 +735,10 @@ > .. _subsequently rejected: > http://mail.python.org/pipermail/python-dev/2004-September/048518.html > > +For Python 2.6, the Python grammar and compiler were modified to allow > +class decorators in addition to function and method decorators. > > + > Community Consensus > ------------------- > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From foom at fuhm.net Tue May 1 18:58:19 2007 From: foom at fuhm.net (James Y Knight) Date: Tue, 1 May 2007 12:58:19 -0400 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: References: <46371BD2.7050303@v.loewis.de> Message-ID: On May 1, 2007, at 12:19 PM, Jim Jewett wrote: > On 5/1/07, "Martin v. L?wis" wrote: > >> The identifier syntax is \*. > >> ID_Start is defined as all characters having one of the general >> categories uppercase letters (Lu), lowercase letters (Ll), titlecase >> letters (Lt), modifier letters (Lm), other letters (Lo), letter >> numbers (Nl), plus the underscore (XXX what are "stability extensions >> listed in UAX 31). > > Are you sure that modifier letters should be included? The standard > says so, but as nearly as I can tell, these are really more like > diacritics -- and some of them look an awful lot like punctuation. > > http://unicode.org/charts/PDF/U02B0.pdf The entire point of these characters is that they are to be treated as letters (that is, can make up part of a word). If they were punctuation or diacritics, the other very-similar-looking characters in other parts of the codespace could be used. These letters seem to be mainly intended for spelling out phonetic pronunciations. It's unlikely that anyone would want to write an python identifier in IPA, but that's not a good reason to go against the standard. James From martin at v.loewis.de Tue May 1 19:39:44 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 01 May 2007 19:39:44 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <435DF58A933BA74397B42CDEB8145A860B745DEF@ex9.hostedexchange.local> References: <435DF58A933BA74397B42CDEB8145A860B745DEF@ex9.hostedexchange.local> Message-ID: <46377B60.1030501@v.loewis.de> Robert Brewer schrieb: > Martin v. L?wis wrote: >> Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers >> >> Common Objections ================= >> >> People claim that they will not be able to use a library if to do >> so they have to use characters they cannot type on their keyboards. >> However, it is the choice of the designer of the library to decide >> on various constraints for using the library: people may not be >> able to use the library because they cannot get physical access to >> the source code (because it is not published), or because licensing >> prohibits usage, or because the documentation is in a language >> they cannot understand. A developer wishing to make a library >> widely available needs to make a number of explicit choices (such >> as publication, licensing, language of documentation, and language >> of identifiers). It should always be the choice of the author to >> make these decisions - not the choice of the language designers. > > That seems true when each such decision is considered in isolation. > But the language designers are responsible to make sure the number of > such explicit decisions/choices does not grow beyond a reasonable > limit. Right. However, it is today already the developer's choice to use English-based identifiers, or from a different language using transliteration. So offering support to use the correct script if they have chosen to use native-language identifiers does not really change the number of explicit decisions. Regards, Martin From jimjjewett at gmail.com Tue May 1 20:13:40 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 1 May 2007 14:13:40 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070501121143.02d31398@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430200255.04b88e10@sparrow.telecommunity.com> <5.1.1.6.0.20070501121143.02d31398@sparrow.telecommunity.com> Message-ID: On 5/1/07, Phillip J. Eby wrote: > At 10:11 AM 5/1/2007 -0400, Jim Jewett wrote: > >On 4/30/07, Phillip J. Eby wrote: > >> >On 4/30/07, Phillip J. Eby wrote: > >> >>PEP 3115, however, requires that a class' metaclass be determined > >> >>*before* the class body has executed, making it impossible to use this > >> >>technique for class decoration any more. ... > >>Sure -- that's what I suggested in the "super(), class decorators, and PEP > >>3115" thread, but Guido voted -1 on adding such a magic attribute to PEP > >>3115. > >I don't think we're understanding each other. > Yup, and we're still not now. :) Or at least, I don't understand what the > code below does, or more precisely, why it's different from just having a > __decorators__ list containing direct callbacks. That would be fine too... but I thought you were saying that you couldn't do this at all any more, because the metaclass had to be determined before the class, instead of inside it. Note that it doesn't have to be any particular magic name -- just one agreed upon by the metaclass and the class author. Today, some such names are semi-standardized already; you don't need language support. Why would you suddenly start needing language support after 3115? -jJ From jimjjewett at gmail.com Tue May 1 20:20:01 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 1 May 2007 14:20:01 -0400 Subject: [Python-3000] Addition to PEP 3101 In-Reply-To: References: <8f01efd00704300953t6154d7e1j7ef18cead1acb344@mail.gmail.com> <46368EE5.6050409@canterbury.ac.nz> <4636AE9E.2020905@acm.org> Message-ID: On 5/1/07, Guido van Rossum wrote: > On 5/1/07, Jim Jewett wrote: > > Note that while (literal strings used as) format strings are > > effectively sandboxed, the formatted objects themselves are not. > > "My name is {0[name]}".format(evil_map) > > would still allow evil_map to run arbitrary code. > And how on earth would that be a security threat? There are some things you can safely do with even arbitrary objects -- such as appending them to a list. By mentioning security as a reason to restrict the format, it suggests that this is another safe context. It isn't. -jJ From jason.orendorff at gmail.com Tue May 1 20:22:27 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Tue, 1 May 2007 14:22:27 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <46376729.9000008@acm.org> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> Message-ID: On 5/1/07, Phillip J. Eby wrote: > At 09:13 AM 5/1/2007 -0700, Talin wrote: > >I don't care for the idea of testing against a specially named argument. > >Why couldn't you just have a different decorator, such as > >"overload_chained" which triggers this behavior? > > The PEP lists *five* built-in decorators, all of which support this behavior:: > > @overload, @when, @before, @after, @around Actually @before and @after don't support __proceeds__, according to the first draft anyway. I think I would prefer to *always* pass the next method to @around methods, which always need it, and *never* pass it to any of the others. What use case am I missing? The one in the PEP involves foo(bar, baz), not a very convincing example. -j From guido at python.org Tue May 1 20:25:52 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 11:25:52 -0700 Subject: [Python-3000] Addition to PEP 3101 In-Reply-To: References: <8f01efd00704300953t6154d7e1j7ef18cead1acb344@mail.gmail.com> <46368EE5.6050409@canterbury.ac.nz> <4636AE9E.2020905@acm.org> Message-ID: On 5/1/07, Jim Jewett wrote: > On 5/1/07, Guido van Rossum wrote: > > On 5/1/07, Jim Jewett wrote: > > > > Note that while (literal strings used as) format strings are > > > effectively sandboxed, the formatted objects themselves are not. > > > > "My name is {0[name]}".format(evil_map) > > > > would still allow evil_map to run arbitrary code. > > > And how on earth would that be a security threat? > > There are some things you can safely do with even arbitrary objects -- > such as appending them to a list. > > By mentioning security as a reason to restrict the format, it suggests > that this is another safe context. It isn't. But your presumption that the map is already evil makes it irrelevant whether the format is safe or not. Having the evil map is the problem, not passing it to the format operation. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 1 20:31:00 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 11:31:00 -0700 Subject: [Python-3000] PEP Parade Message-ID: So the PEP submissions are in, and a few late ones will be submitted ASAP. Let me write up a capsule review of what we've got. Please let me know if I missed anything (e.g. a PEP that someone has committed to write but hasn't submitted yet). First the PEPs that have numbers as of this writing (I'm pasting the section heads right out of PEP 0, so apoligies for the formatting): S 3101 Advanced String Formatting Talin While we're still tweaking details, I expect this will be ready for acceptance soon. We also have an implementation in the sandbox! S 3108 Standard Library Reorganization Cannon I expect this to happen after 3.0a1 is released. S 3116 New I/O Stutzbach, Verdone, GvR A prototype is in the py3k branch. There are details to work through (like how to seek on text files with non-trivial encodings) but I feel that the basis is solid. I could use help coding! S 3117 Postfix Type Declarations Brandl I forgot to reject this -- it was my favorite April Fool's post of the year though. :-) S 3118 Revising the buffer protocol Oliphant, Banks Where's this standing? I'm assuming that it's pretty much ready to be implemented. I haven't had the time to participate in the discussion. S 3119 Introducing Abstract Base Classes GvR, Talin This is clearly still controversial. It is also awaiting a rewrite. I am still in favor of something this (or I wouldn't bother with the rewrite). S 3120 Using UTF-8 as the default source encoding von L?wis The basic idea seems very reasonable. I expect that the changes to the parser may be quite significant though. Also, the parser ought to be weened of C stdio in favor of Python's own I/O library. I wonder if it's really possible to let the parser read the raw bytes though -- this would seem to rule out supporting encodings like UTF-16. Somehow I wonder if it wouldn't be easier if the parser operated on Unicode input? That way parsing unicode strings (which we must support as all strings will become unicode) will be simpler. S 3121 Module Initialization and finalization von L?wis I like it. I wish the title were changed to "Extension Module ..." though. S 3123 Making PyObject_HEAD conform to standard C von L?wis I like it, but who's going to make the changes? Once those chnges have been made, will it still be reasonable to expect to merge C code from the (2.6) trunk into the 3.0 branch? S 3124 Overloading, Generic Functions, Interfaces Eby I haven't had the time to read this in detail, but in general I'm feeling favorable about this idea. I'd rather see it decoupled from sys._getframe() and modifying func_code (actually __code__ nowadays, see PEP 3100). S 3125 Remove Backslash Continuation Jewett Sounds reasonable. I think we should still support \ inside string literals though; the PEP isn't clear on this. I hope this falls within the scope of the refactoring tool (sandbox/2to3). S 3126 Remove Implicit String Concatenation Jewett Sounds reasonable as well. A fixer for this would be trivial to add to the refactoring tool. S 3127 Integer Literal Support and Syntax Maupin Fully in favor. S 3128 BList: A Faster List-like Type Stutzbach I still have misgivings about having too many options for developers. While wizards will have no problem deciding between regular lists and BLists, I worry that a meme might spread among junior coders that the built-in list type is slow, causing overuse of BLists for no good reason. But I am deferring to Raymond Hettinger in this matter. S 3141 A Type Hierarchy for Numbers Yasskin Jeffrey has promised to rewrite this, removing most of the references to algebra. I expect I'll like his rewrite, once it happens. Now on to the PEPs that don't have numbers yet. PEP: Supporting Non-ASCII identifiers (Martin von Loewis) I'm on record as not liking this; my worry is that it will become a barrier to the free exchange of code. It's not just languages I can't read (Russian transliterated to the latin alphabet would be just as bad and we don't stop that now); many text editors have no or limited support for other scripts (not to mention mixing right-to-left script with Python's left-to-right identifiers). But if this receives a lot of popular support I'm willing to give it a try. The One Laptop Per Child project for example would like to enable students to code in their own language (of course they'd rather see the language keywords and standard library translated too...). PEP: Adding class decorators (???) I'm in favor of this. I'm just writing for someone to write it up. PEP: Eliminate __del__ (Raymond Hettinger) I would be in favor of this or one of the alternative ideas for fixing the can't-GC-a-cycle-with-__del__ issue if there was a clear recipe and (if necessary) stdlib support for what to do instead. There are real use cases for automatic finalization for which the atexit module isn't the right solution and try/finally or with statements don't cut it either. PEP: Information Attributes (Raymond Hettinger) This would be better served by a continued discussion about the merits and flaws of ABCs (PEP 3119 and 3141). PEP: Traits/roles instead of ABCs (Collin Winter) This could serve as an interesting alternative to PEP 3119. However, I believe that it doesn't really solve the distinction between abstractions that can be implemented as "classic" ABCs and abstractions that require a metaclass (like TotalOrder or Ring). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Tue May 1 20:34:18 2007 From: collinw at gmail.com (Collin Winter) Date: Tue, 1 May 2007 11:34:18 -0700 Subject: [Python-3000] PEP Parade In-Reply-To: References: Message-ID: <43aa6ff70705011134p46c8269dv39059c242ba8e12b@mail.gmail.com> On 5/1/07, Guido van Rossum wrote: > PEP: Adding class decorators (???) > > I'm in favor of this. I'm just writing for someone to write it up. I just checked in PEP 3129, "Class Decorators". Collin Winter From jimjjewett at gmail.com Tue May 1 20:39:59 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 1 May 2007 14:39:59 -0400 Subject: [Python-3000] Addition to PEP 3101 In-Reply-To: References: <8f01efd00704300953t6154d7e1j7ef18cead1acb344@mail.gmail.com> <46368EE5.6050409@canterbury.ac.nz> <4636AE9E.2020905@acm.org> Message-ID: On 5/1/07, Guido van Rossum wrote: > On 5/1/07, Jim Jewett wrote: > > There are some things you can safely do with even arbitrary objects -- > > such as appending them to a list. > > By mentioning security as a reason to restrict the format, it suggests > > that this is another safe context. It isn't. > But your presumption that the map is already evil makes it irrelevant > whether the format is safe or not. Having the evil map is the problem, > not passing it to the format operation. Using a map was probably misleading. Let me rephrase: While the literal string itself is safe, the format function is only as safe as the objects being formatted. The example below gets person.name; if the person object itself is malicious, then even this attribute access could run arbitrary code. "My name is {0.name}".format(person) -jJ From pje at telecommunity.com Tue May 1 20:48:54 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 14:48:54 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <46376729.9000008@acm.org> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070501143840.02ca7760@sparrow.telecommunity.com> At 02:22 PM 5/1/2007 -0400, Jason Orendorff wrote: >On 5/1/07, Phillip J. Eby wrote: >>At 09:13 AM 5/1/2007 -0700, Talin wrote: >> >I don't care for the idea of testing against a specially named argument. >> >Why couldn't you just have a different decorator, such as >> >"overload_chained" which triggers this behavior? >> >>The PEP lists *five* built-in decorators, all of which support this >>behavior:: >> >> @overload, @when, @before, @after, @around > >Actually @before and @after don't support __proceeds__, >according to the first draft anyway. True; anything that derives from MethodList isn't going to need it, so that means that @discount won't use it, either. Still, that's three decorators left: @overload, @when, and @around, plus any custom decorators based on Method in place of MethodList. (@when and @around are implemented as the 'make_decorator' of Method and Around, respectively.) >I think I would prefer to *always* pass the next method >to @around methods, which always need it, and *never* >pass it to any of the others. What use case am I missing? Calling the next method in a generic function is equivalent to calling super() in a normal method. Anytime you want to add more specific behavior for a type, while reusing the more general behavior, you're going to need it. Therefore, "primary" methods are always potential users of it. Syntactically speaking, I would certainly agree that the ideal solution is something that looks like a super() call; it's just that supporting that requires *more* of the sort of hackery that Guido wants *less* of here. Signature inspection isn't as much of a black art as magical functions that need to know how the current function was invoked. The other possibility would be to clone the functions using copied func_globals (__globals__?) so that 'next_method' in those namespaces would point to the right next method. But then, if the function *writes* any globals, it'll be updating the wrong namespace. Do you have any other ideas? From pje at telecommunity.com Tue May 1 20:51:58 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 14:51:58 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070501121143.02d31398@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430200255.04b88e10@sparrow.telecommunity.com> <5.1.1.6.0.20070501121143.02d31398@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070501144907.04c2efe8@sparrow.telecommunity.com> At 02:13 PM 5/1/2007 -0400, Jim Jewett wrote: >On 5/1/07, Phillip J. Eby wrote: > > Yup, and we're still not now. :) Or at least, I don't understand what the > > code below does, or more precisely, why it's different from just having a > > __decorators__ list containing direct callbacks. > >That would be fine too... but I thought you were saying that you >couldn't do this at all any more, because the metaclass had to be >determined before the class, instead of inside it. Correct. >Note that it doesn't have to be any particular magic name -- just one >agreed upon by the metaclass and the class author. Today, some such >names are semi-standardized already; you don't need language support. > >Why would you suddenly start needing language support after 3115? Because it eliminated an existing magic name: __metaclass__. Under the old regime, you could simply replace __metaclass__ with a function that called the old __metaclass__, then applied any desired decoration to the result. A __decorators__ hook would replace this hack with something less convoluted, and allow method decorators and attribute descriptors a chance to modify the class, if needed. (For example, the @abstractmethod could ensure the class was abstract, or raise an error if the class wasn't explicitly declared abstract.) From pmaupin at gmail.com Tue May 1 20:52:20 2007 From: pmaupin at gmail.com (Patrick Maupin) Date: Tue, 1 May 2007 13:52:20 -0500 Subject: [Python-3000] Addition to PEP 3101 In-Reply-To: References: <8f01efd00704300953t6154d7e1j7ef18cead1acb344@mail.gmail.com> <46368EE5.6050409@canterbury.ac.nz> <4636AE9E.2020905@acm.org> Message-ID: On 5/1/07, Jim Jewett wrote: > On 5/1/07, Guido van Rossum wrote: > > But your presumption that the map is already evil makes it irrelevant > > whether the format is safe or not. Having the evil map is the problem, > > not passing it to the format operation. > > Using a map was probably misleading. Let me rephrase: > > While the literal string itself is safe, the format function is only > as safe as the objects being formatted. The example below gets > person.name; if the person object itself is malicious, then even this > attribute access could run arbitrary code. > > "My name is {0.name}".format(person) > > -jJ There is a (perhaps misguided) consensus that the format() operation ought to have the property that a programmer can write a program which will not have an issue with potentially hostile strings. (Personally, I view security as an open-ended problem, and don't deal with hostile strings without a LOT of massaging.) It is, and will continue to be the case, that the programmer can EASILY write code that would do something bad with a given format string, and yet not do something bad with another format string. This is true even with the percent operator and a dictionary (which might be subclassed to do something evil on a lookup operator). All the format() operation can do to help in this instance is a few minor restriction. Don't allow calls, don't allow lookups of attributes with leading underscores. This makes it relatively easy to write "format-safe" objects. Does it make it impossible to write a "format-unsafe" object? No, and that was never the intention. Regards, Pat From eric+python-dev at trueblade.com Tue May 1 20:54:53 2007 From: eric+python-dev at trueblade.com (Eric V. Smith) Date: Tue, 01 May 2007 14:54:53 -0400 Subject: [Python-3000] Addition to PEP 3101 In-Reply-To: References: <8f01efd00704300953t6154d7e1j7ef18cead1acb344@mail.gmail.com> <46368EE5.6050409@canterbury.ac.nz> <4636AE9E.2020905@acm.org> Message-ID: <46378CFD.2000004@trueblade.com> Jim Jewett wrote: > On 5/1/07, Guido van Rossum wrote: >> On 5/1/07, Jim Jewett wrote: > >>> There are some things you can safely do with even arbitrary objects -- >>> such as appending them to a list. > >>> By mentioning security as a reason to restrict the format, it suggests >>> that this is another safe context. It isn't. > >> But your presumption that the map is already evil makes it irrelevant >> whether the format is safe or not. Having the evil map is the problem, >> not passing it to the format operation. > > Using a map was probably misleading. Let me rephrase: > > While the literal string itself is safe, the format function is only > as safe as the objects being formatted. The example below gets > person.name; if the person object itself is malicious, then even this > attribute access could run arbitrary code. > > "My name is {0.name}".format(person) > I think the concern is this: Suppose we have: class Person: def destroy_children(self): # do something destructive name = 'me' person = Person() "My name is {0.name}".format(person) # ok "My name is {0.destroy_children()}".format(person) # ouch One intent of the PEP is that the strings come from a translation, or are otherwise out of the direct control of the original programmer. So the thought is that attributes of objects being formatted are probably always "safe" to call, while methods might be "unsafe", for some definitions of "safe" and "unsafe". Whether this justifies the exclusion of calling methods (or callables themselves), I can't say. I can say that calling methods that have parameters would significantly complicate our implementation of PEP 3101. The original message in this thread only has examples of calling methods without parameters, it's not clear to me if that's only intended use. From jimjjewett at gmail.com Tue May 1 20:57:41 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 1 May 2007 14:57:41 -0400 Subject: [Python-3000] PEP Parade In-Reply-To: References: Message-ID: On 5/1/07, Guido van Rossum wrote: > So the PEP submissions are in, and a few late ones will be submitted > ASAP. Let me write up a capsule review of what we've got. Please let > me know if I missed anything (e.g. a PEP that someone has committed to > write but hasn't submitted yet). (1) The __this_*__ PEP was written and posted; I'll revise it slightly tonight. One benefit would be a minimal-change version of super. (2) Calvin's and Tim's more complete reworking of super. (3) final/once/name annotations -- I *think* this was dropped when case statements were rejected, but I'm not sure. > PEP: Eliminate __del__ (Raymond Hettinger) > I would be in favor of this or one of the alternative ideas for fixing > the can't-GC-a-cycle-with-__del__ issue if there was a clear recipe > and (if necessary) stdlib support for what to do instead. There are > real use cases for automatic finalization for which the atexit module > isn't the right solution and try/finally or with statements don't cut > it either. Does the alternative need to cover 100% of use cases? If it covers 99%, should the other 1% become impossible, or should we keep __del__ as fallback? -jJ From guido at python.org Tue May 1 21:01:53 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 12:01:53 -0700 Subject: [Python-3000] PEP Parade In-Reply-To: References: Message-ID: On 5/1/07, Jim Jewett wrote: > On 5/1/07, Guido van Rossum wrote: > > > So the PEP submissions are in, and a few late ones will be submitted > > ASAP. Let me write up a capsule review of what we've got. Please let > > me know if I missed anything (e.g. a PEP that someone has committed to > > write but hasn't submitted yet). > > (1) The __this_*__ PEP was written and posted; I'll revise it slightly tonight. __this__? What's that? I must've missed the posting of the pep, sorry. You can mail me the PEP (best as an attachment) and I will assign it a number and check it in. > One benefit would be a minimal-change version of super. > > (2) Calvin's and Tim's more complete reworking of super. Oooh, I missed that too. > (3) final/once/name annotations -- I *think* this was dropped when > case statements were rejected, but I'm not sure. Unless there's a PEP that was posted before the deadline I don't want to hear about it. > > PEP: Eliminate __del__ (Raymond Hettinger) > > > I would be in favor of this or one of the alternative ideas for fixing > > the can't-GC-a-cycle-with-__del__ issue if there was a clear recipe > > and (if necessary) stdlib support for what to do instead. There are > > real use cases for automatic finalization for which the atexit module > > isn't the right solution and try/finally or with statements don't cut > > it either. > > Does the alternative need to cover 100% of use cases? > > If it covers 99%, should the other 1% become impossible, or should we > keep __del__ as fallback? What 1% use case are you thinking of? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Tue May 1 21:07:36 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 15:07:36 -0400 Subject: [Python-3000] PEP Parade In-Reply-To: Message-ID: <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> At 11:31 AM 5/1/2007 -0700, Guido van Rossum wrote: >I haven't had the time to read this in detail, but in general I'm >feeling favorable about this idea. I'd rather see it decoupled from >sys._getframe() and modifying func_code (actually __code__ nowadays, >see PEP 3100). I've figured out how to drop *some* (but not all) of the _getframe() hackery from the current proposal, btw. (Specifically, I believe I can make the decorators decide which function to return using __name__ comparisons instead of by checking frame contents.) Regarding __code__, however, it's either that or allow functions to be subclassed and have their type changed at runtime. In other words, if you could meaningfully assign to a function's __class__, then mucking with its __code__ would be unnecessary; we'd just override __call__ in a subclass, and change the __class__ when overloading an existing function. Unfortunately, I believe that CPython 2.3 and up don't let you change the type of instances of built-in classes, and it's never been possible to subclass the function type, AFAIK. OTOH, these restrictions may not exist in Jython, IronPython, or PyPy; if they allow you to subclass the function type and change a function's __class__, then that approach becomes a reasonable implementation choice on those platforms. Thus, assignment to __code__ might reasonably be considered a workaround for the limitations of CPython in this respect, rather than a CPython-dependent hack. :) From jimjjewett at gmail.com Tue May 1 21:08:02 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 1 May 2007 15:08:02 -0400 Subject: [Python-3000] Addition to PEP 3101 In-Reply-To: References: <8f01efd00704300953t6154d7e1j7ef18cead1acb344@mail.gmail.com> <46368EE5.6050409@canterbury.ac.nz> <4636AE9E.2020905@acm.org> Message-ID: On 5/1/07, Patrick Maupin wrote: > attributes with leading underscores. This makes it relatively easy to > write "format-safe" objects. Does it make it impossible to write a > "format-unsafe" object? No, and that was never the intention. Agreed; I just think this restriction should be explicit, given that security is mentioned. -jJ From guido at python.org Tue May 1 21:11:31 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 12:11:31 -0700 Subject: [Python-3000] Addition to PEP 3101 In-Reply-To: References: <8f01efd00704300953t6154d7e1j7ef18cead1acb344@mail.gmail.com> <46368EE5.6050409@canterbury.ac.nz> <4636AE9E.2020905@acm.org> Message-ID: On 5/1/07, Jim Jewett wrote: > On 5/1/07, Guido van Rossum wrote: > > On 5/1/07, Jim Jewett wrote: > > > > There are some things you can safely do with even arbitrary objects -- > > > such as appending them to a list. > > > > By mentioning security as a reason to restrict the format, it suggests > > > that this is another safe context. It isn't. > > > But your presumption that the map is already evil makes it irrelevant > > whether the format is safe or not. Having the evil map is the problem, > > not passing it to the format operation. > > Using a map was probably misleading. Let me rephrase: > > While the literal string itself is safe, the format function is only > as safe as the objects being formatted. The example below gets > person.name; if the person object itself is malicious, then even this > attribute access could run arbitrary code. > > "My name is {0.name}".format(person) And my point is that the security concerns here are not about malicious arguments to the format() method; that's not part of the threat model. If you have a person object in your program you can't trust, you have a problem whether or not you use the format method. The threat we're concerned here (as Patrick explained in his response) is format strings provided by translators or non-root webmasters or (less likely) end users. Translation is probably the main use case; another use case is exemplified by mailman, which gives list owners the means to edit list-specific html templates which are used as format strings. We want to prevent those folks from (accidentally or intentionally) crashing the web server or elevating their privileges. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 1 21:14:47 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 12:14:47 -0700 Subject: [Python-3000] PEP Parade In-Reply-To: <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> References: <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> Message-ID: Suppose you couldn't assign to __class__ of a function (that's too messy to deal with in CPython) and you couldn't assign to its __code__ either. What proposed functionality would you lose? How would you ideally implement that functionality if you had the ability to modify CPython in other ways? (I'm guessing you'd want to add some functionality to function objects; what would that functionality have to do?) --Guido On 5/1/07, Phillip J. Eby wrote: > At 11:31 AM 5/1/2007 -0700, Guido van Rossum wrote: > >I haven't had the time to read this in detail, but in general I'm > >feeling favorable about this idea. I'd rather see it decoupled from > >sys._getframe() and modifying func_code (actually __code__ nowadays, > >see PEP 3100). > > I've figured out how to drop *some* (but not all) of the _getframe() > hackery from the current proposal, btw. (Specifically, I believe I can > make the decorators decide which function to return using __name__ > comparisons instead of by checking frame contents.) > > Regarding __code__, however, it's either that or allow functions to be > subclassed and have their type changed at runtime. > > In other words, if you could meaningfully assign to a function's __class__, > then mucking with its __code__ would be unnecessary; we'd just override > __call__ in a subclass, and change the __class__ when overloading an > existing function. > > Unfortunately, I believe that CPython 2.3 and up don't let you change the > type of instances of built-in classes, and it's never been possible to > subclass the function type, AFAIK. > > OTOH, these restrictions may not exist in Jython, IronPython, or PyPy; if > they allow you to subclass the function type and change a function's > __class__, then that approach becomes a reasonable implementation choice on > those platforms. > > Thus, assignment to __code__ might reasonably be considered a workaround > for the limitations of CPython in this respect, rather than a > CPython-dependent hack. :) > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Tue May 1 22:08:02 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 16:08:02 -0400 Subject: [Python-3000] PEP Parade In-Reply-To: References: <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070501155833.043589f8@sparrow.telecommunity.com> At 12:14 PM 5/1/2007 -0700, Guido van Rossum wrote: >Suppose you couldn't assign to __class__ of a function (that's too >messy to deal with in CPython) and you couldn't assign to its __code__ >either. What proposed functionality would you lose? The ability to overload any function, without having to track down all the places it's already been imported or otherwise saved, and change them to point to a new function or a non-function object. >How would you >ideally implement that functionality if you had the ability to modify >CPython in other ways? (I'm guessing you'd want to add some >functionality to function objects; what would that functionality have >to do?) Hm... well, in PyPy they have a "become" feature (I don't know if it's a mainline feature or not) that allows you to say, "replace object A with object B, wherever A is currently referenced". Then the replacement object (GF implementation) needn't even be a function. A narrower feature, however, more specific to functions, would just be *some* way to redirect or guard the function's actual execution. For example, if function objects had a writable __call__ attribute, that would be invoked in place of the normal behavior. (Assuming there was a way to save the old __call__ or make a copy of the function before it was modified.) I really just need a way to make calling the function do something different from what it normally would -- and ideally this should be in such a way that I could still invoke the function's original behavior. (So it can be used as the default method when nothing else matches, or the least-specific fallback method.) From jgarber at ionzoft.com Tue May 1 21:14:05 2007 From: jgarber at ionzoft.com (Jason Garber) Date: Tue, 1 May 2007 14:14:05 -0500 Subject: [Python-3000] DB API SQL injection issue Message-ID: Hello, In PEP 249 (Python Database API Specification v2.0), there is a paragraph about cursors that reads: .execute(operation[,parameters]) Prepare and execute a database operation (query or command). Parameters may be provided as sequence or mapping and will be bound to variables in the operation. Variables are specified in a database-specific notation (see the module's paramstyle attribute for details). [5] I propose that the second parameter to execute() is changed to be a required parameter to prevent accidental SQL injection vulnerabilities. Why? Consider the following two lines of code cur.execute("SELECT * FROM t WHERE a=%s", (avalue)) cur.execute("SELECT * FROM t WHERE a=%s" % (avalue)) It is easy for a developer to inadvertently place a "%" operator instead of a "," between the two parameters. In this case, python string formatting rules take over, and un-escaped values get inserted directly into the SQL - silently. After using standard string formatting characters like "%s" in the string, and it is quite natural to place a % at the end. The requirement of the second parameter would eliminate this possibility. None would be passed (explicitly) if there are no replacements needed. My rational for this is based: 1. partly on observation of code with this problem. 2. partly on the rationale for PEP 3126 (Remove Implicit String Concatenation). >From PEP 3126: Rationale for Removing Implicit String Concatenation Implicit String concatentation can lead to confusing, or even silent, errors. def f(arg1, arg2=None): pass f("abc" "def") # forgot the comma, no warning ... # silently becomes f("abcdef", None) or, using the scons build framework, sourceFiles = [ 'foo.c' 'bar.c', #...many lines omitted... 'q1000x.c'] It's a common mistake to leave off a comma, and then scons complains that it can't find 'foo.cbar.c'. This is pretty bewildering behavior even if you *are* a Python programmer, and not everyone here is. [1] I know that this is not a functional problem, but perhaps a safeguard can be put in place to prevent disastrous SQL injection issues from arising needlessly. For your consideration. Sincerely, Jason Garber Senior Systems Engineer IonZoft, Inc. From g.brandl at gmx.net Tue May 1 22:11:54 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 01 May 2007 22:11:54 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <46376876.1010803@v.loewis.de> References: <46371BD2.7050303@v.loewis.de> <4637631A.6030702@v.loewis.de> <43aa6ff70705010905l3f87d57ck5a8f5597a6de9dab@mail.gmail.com> <46376876.1010803@v.loewis.de> Message-ID: Martin v. L?wis schrieb: >> Reading from >> http://mail.python.org/pipermail/python-3000/2006-April/001474.html, >> the message that prompted this particular addition to PEP 3099, "I >> want good Unicode support for string literals and comments. Everything >> else in the language ought to be ASCII." >> >> Identifiers aren't string literals or comments. > > Sure, but please follow the follow-up communication also. In any case, the entry in PEP 3099 should not be used as a reason to reject the PEP. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From pedronis at openendsystems.com Tue May 1 22:34:58 2007 From: pedronis at openendsystems.com (Samuele Pedroni) Date: Tue, 01 May 2007 22:34:58 +0200 Subject: [Python-3000] PEP Parade In-Reply-To: <5.1.1.6.0.20070501155833.043589f8@sparrow.telecommunity.com> References: <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> <5.1.1.6.0.20070501155833.043589f8@sparrow.telecommunity.com> Message-ID: <4637A472.7090307@openendsystems.com> For what is worth changing func_code is supported both by PyPy and Jython. What cannot be done in Jython is construct a code object out of a string of CPython bytecode, but it can be extracted from other functions. Jython 2.2b1 on java1.5.0_07 Type "copyright", "credits" or "license" for more information. >>> def f(): ... return 1 ... >>> f() 1 >>> def g(): ... return 2 ... >>> f.func_code = g.func_code >>> f() 2 >>> Phillip J. Eby wrote: > At 12:14 PM 5/1/2007 -0700, Guido van Rossum wrote: > >> Suppose you couldn't assign to __class__ of a function (that's too >> messy to deal with in CPython) and you couldn't assign to its __code__ >> either. What proposed functionality would you lose? >> > > The ability to overload any function, without having to track down all the > places it's already been imported or otherwise saved, and change them to > point to a new function or a non-function object. > > > >> How would you >> ideally implement that functionality if you had the ability to modify >> CPython in other ways? (I'm guessing you'd want to add some >> functionality to function objects; what would that functionality have >> to do?) >> > > Hm... well, in PyPy they have a "become" feature (I don't know if it's a > mainline feature or not) that allows you to say, "replace object A with > object B, wherever A is currently referenced". Then the replacement object > (GF implementation) needn't even be a function. > > A narrower feature, however, more specific to functions, would just be > *some* way to redirect or guard the function's actual execution. For > example, if function objects had a writable __call__ attribute, that would > be invoked in place of the normal behavior. (Assuming there was a way to > save the old __call__ or make a copy of the function before it was modified.) > > I really just need a way to make calling the function do something > different from what it normally would -- and ideally this should be in such > a way that I could still invoke the function's original behavior. (So it > can be used as the default method when nothing else matches, or the > least-specific fallback method.) > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/pedronis%40openendsystems.com > From tcdelaney at optusnet.com.au Tue May 1 22:51:35 2007 From: tcdelaney at optusnet.com.au (Tim Delaney) Date: Wed, 2 May 2007 06:51:35 +1000 Subject: [Python-3000] PEP Parade References: Message-ID: <01cb01c78c32$8254d6c0$0201a8c0@ryoko> From: "Jim Jewett" > On 5/1/07, Guido van Rossum wrote: > >> So the PEP submissions are in, and a few late ones will be submitted >> ASAP. Let me write up a capsule review of what we've got. Please let >> me know if I missed anything (e.g. a PEP that someone has committed to >> write but hasn't submitted yet). > > (1) The __this_*__ PEP was written and posted; I'll revise it slightly > tonight. > > One benefit would be a minimal-change version of super. I intend for the 'super' PEP to not rely on this in any way, but will add a note that your PEP (and other changes) may make the implementation simpler, and so the implementation should be revisited before 3.0. The semantics of 'super' OTOH should be fully clarified in our PEP. Tim Delaney From nicko at nicko.org Tue May 1 22:38:26 2007 From: nicko at nicko.org (Nicko van Someren) Date: Tue, 1 May 2007 21:38:26 +0100 Subject: [Python-3000] DB API SQL injection issue In-Reply-To: References: Message-ID: On 1 May 2007, at 20:14, Jason Garber wrote: > In PEP 249 (Python Database API Specification v2.0), there is a > paragraph about cursors that reads: > > .execute(operation[,parameters]) > Prepare and execute a database operation (query or > command). Parameters may be provided as sequence or > mapping and will be bound to variables in the operation. > Variables are specified in a database-specific notation > (see the module's paramstyle attribute for details). [5] > > I propose that the second parameter to execute() is changed to be a > required parameter to prevent accidental SQL injection > vulnerabilities. How do you propose to deal with the SQL commands for which there is no need to do any parameter replacement? This is not at all uncommon; would you expect to make people type cur.execute("SELECT DISTINCT zip_code FROM customer_addresses", None) or somesuch? Nicko From guido at python.org Tue May 1 23:04:35 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 14:04:35 -0700 Subject: [Python-3000] PEP Parade In-Reply-To: <5.1.1.6.0.20070501155833.043589f8@sparrow.telecommunity.com> References: <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> <5.1.1.6.0.20070501155833.043589f8@sparrow.telecommunity.com> Message-ID: On 5/1/07, Phillip J. Eby wrote: > At 12:14 PM 5/1/2007 -0700, Guido van Rossum wrote: > >Suppose you couldn't assign to __class__ of a function (that's too > >messy to deal with in CPython) and you couldn't assign to its __code__ > >either. What proposed functionality would you lose? > > The ability to overload any function, without having to track down all the > places it's already been imported or otherwise saved, and change them to > point to a new function or a non-function object. Frankly, I'm not sure this is worth all the proposed contortions. I'd be happy (especially as long as this is a pure-Python thing) to have to flag the base implementation explicitly with a decorator to make it overloadable. That seems KISS to me. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Tue May 1 23:44:59 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 01 May 2007 23:44:59 +0200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking Message-ID: This is a bit late, but it was in my queue by April 30, I swear! ;) Comments are appreciated, especially some phrasing sounds very clumsy to me, but I couldn't find a better one. Georg PEP: 3132 Title: Extended Iterable Unpacking Version: $Revision$ Last-Modified: $Date$ Author: Georg Brandl Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-Apr-2007 Python-Version: 3.0 Post-History: Abstract ======== This PEP proposes a change to iterable unpacking syntax, allowing to specify a "catch-all" name which will be assigned a list of all items not assigned to a "regular" name. An example says more than a thousand words:: >>> a, *b, c = range(5) >>> a 0 >>> c 4 >>> b [1, 2, 3] Rationale ========= Many algorithms require splitting a sequence in a "first, rest" pair. With the new syntax, :: first, rest = seq[0], seq[1:] is replaced by the cleaner and probably more efficient:: first, *rest = seq For more complex unpacking patterns, the new syntax looks even cleaner, and the clumsy index handling is not necessary anymore. Specification ============= A tuple (or list) on the left side of a simple assignment (unpacking is not defined for augmented assignment) may contain at most one expression prepended with a single asterisk. For the rest of this section, the other expressions in the list are called "mandatory". Note that this also refers to tuples in implicit assignment context, such as in a ``for`` statement. This designates a subexpression that will be assigned a list of all items from the iterable being unpacked that are not assigned to any of the mandatory expressions, or an empty list if there are no such items. It is an error (as it is currently) if the iterable doesn't contain enough items to assign to all the mandatory expressions. Implementation ============== The proposed implementation strategy is: - add a new grammar rule, ``star_test``, which consists of ``'*' test`` and is used in test lists - add a new ASDL type ``Starred`` to represent a starred expression - catch all cases where starred expressions are not allowed in the AST and symtable generation stage - add a new opcode, ``UNPACK_EX``, which will only be used if a list/tuple to be assigned to contains a starred expression - change ``unpack_iterable()`` in ceval.c to handle the extended unpacking case Note that the starred expression element introduced here is universal and could be used for other purposes in non-assignment context, such as the ``yield *iterable`` proposal. The author has written a draft implementation, but there are some open issues which will be resolved in case this PEP is looked upon benevolently. Open Issues =========== - Should the catch-all expression be assigned a list or a tuple of items? References ========== None yet. Copyright ========= This document has been placed in the public domain. -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From guido at python.org Wed May 2 00:00:33 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 15:00:33 -0700 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: On 5/1/07, Georg Brandl wrote: > This is a bit late, but it was in my queue by April 30, I swear! ;) Accepted. > Comments are appreciated, especially some phrasing sounds very clumsy > to me, but I couldn't find a better one. > > Georg > > > PEP: 3132 > Title: Extended Iterable Unpacking > Version: $Revision$ > Last-Modified: $Date$ > Author: Georg Brandl > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 30-Apr-2007 > Python-Version: 3.0 > Post-History: > > > Abstract > ======== > > This PEP proposes a change to iterable unpacking syntax, allowing to > specify a "catch-all" name which will be assigned a list of all items > not assigned to a "regular" name. > > An example says more than a thousand words:: > > >>> a, *b, c = range(5) > >>> a > 0 > >>> c > 4 > >>> b > [1, 2, 3] Has it been pointed out to you already that this particular example is hard to implement if the RHS is an iterator whose length is not known a priori? The implementation would have to be quite hairy -- it would have to assign everything to the list b until the iterator is exhausted, and then pop a value from the end of the list and assign it to c. it would be much easier if *b was only allowed at the end. (It would be even worse if b were assigned a tuple instead of a list, as per your open issues.) Also, what should this do? Perhaps the grammar could disallow it? *a = range(5) > Rationale > ========= > > Many algorithms require splitting a sequence in a "first, rest" pair. > With the new syntax, :: > > first, rest = seq[0], seq[1:] > > is replaced by the cleaner and probably more efficient:: > > first, *rest = seq > > For more complex unpacking patterns, the new syntax looks even > cleaner, and the clumsy index handling is not necessary anymore. > > > Specification > ============= > > A tuple (or list) on the left side of a simple assignment (unpacking > is not defined for augmented assignment) may contain at most one > expression prepended with a single asterisk. For the rest of this > section, the other expressions in the list are called "mandatory". > > Note that this also refers to tuples in implicit assignment context, > such as in a ``for`` statement. > > This designates a subexpression that will be assigned a list of all > items from the iterable being unpacked that are not assigned to any > of the mandatory expressions, or an empty list if there are no such > items. > > It is an error (as it is currently) if the iterable doesn't contain > enough items to assign to all the mandatory expressions. > > > Implementation > ============== > > The proposed implementation strategy is: > > - add a new grammar rule, ``star_test``, which consists of ``'*' > test`` and is used in test lists > - add a new ASDL type ``Starred`` to represent a starred expression > - catch all cases where starred expressions are not allowed in the AST > and symtable generation stage > - add a new opcode, ``UNPACK_EX``, which will only be used if a > list/tuple to be assigned to contains a starred expression > - change ``unpack_iterable()`` in ceval.c to handle the extended > unpacking case > > Note that the starred expression element introduced here is universal > and could be used for other purposes in non-assignment context, such > as the ``yield *iterable`` proposal. > > The author has written a draft implementation, but there are some open > issues which will be resolved in case this PEP is looked upon > benevolently. > > > Open Issues > =========== > > - Should the catch-all expression be assigned a list or a tuple of items? > > > References > ========== > > None yet. > > > Copyright > ========= > > This document has been placed in the public domain. > > > -- > Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. > Four shall be the number of spaces thou shalt indent, and the number of thy > indenting shall be four. Eight shalt thou not indent, nor either indent thou > two, excepting that thou then proceed to four. Tabs are right out. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nevillegrech at gmail.com Wed May 2 00:55:55 2007 From: nevillegrech at gmail.com (Neville Grech Neville Grech) Date: Wed, 2 May 2007 00:55:55 +0200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: This reminds me a lot of haskell/prolog's head/tail list splitting. Looks like a good feature. a*=range(5) hmmn maybe in such a case, whenever there is the * operator, the resulting item is always a list/tuple, like the following: a=[[0,1,2,3,4]] ? I have another question, what would happen in the case a*,b=tuple(range(5)) a = (0,1,2,3) ? Should this keep the same type of container i.e. lists to lists and tuples to tuples or always convert to list? -Neville On 5/2/07, Guido van Rossum wrote: > > On 5/1/07, Georg Brandl wrote: > > This is a bit late, but it was in my queue by April 30, I swear! ;) > > Accepted. > > > Comments are appreciated, especially some phrasing sounds very clumsy > > to me, but I couldn't find a better one. > > > > Georg > > > > > > PEP: 3132 > > Title: Extended Iterable Unpacking > > Version: $Revision$ > > Last-Modified: $Date$ > > Author: Georg Brandl > > Status: Draft > > Type: Standards Track > > Content-Type: text/x-rst > > Created: 30-Apr-2007 > > Python-Version: 3.0 > > Post-History: > > > > > > Abstract > > ======== > > > > This PEP proposes a change to iterable unpacking syntax, allowing to > > specify a "catch-all" name which will be assigned a list of all items > > not assigned to a "regular" name. > > > > An example says more than a thousand words:: > > > > >>> a, *b, c = range(5) > > >>> a > > 0 > > >>> c > > 4 > > >>> b > > [1, 2, 3] > > Has it been pointed out to you already that this particular example is > hard to implement if the RHS is an iterator whose length is not known > a priori? The implementation would have to be quite hairy -- it would > have to assign everything to the list b until the iterator is > exhausted, and then pop a value from the end of the list and assign it > to c. it would be much easier if *b was only allowed at the end. (It > would be even worse if b were assigned a tuple instead of a list, as > per your open issues.) > > Also, what should this do? Perhaps the grammar could disallow it? > > *a = range(5) > > > Rationale > > ========= > > > > Many algorithms require splitting a sequence in a "first, rest" pair. > > With the new syntax, :: > > > > first, rest = seq[0], seq[1:] > > > > is replaced by the cleaner and probably more efficient:: > > > > first, *rest = seq > > > > For more complex unpacking patterns, the new syntax looks even > > cleaner, and the clumsy index handling is not necessary anymore. > > > > > > Specification > > ============= > > > > A tuple (or list) on the left side of a simple assignment (unpacking > > is not defined for augmented assignment) may contain at most one > > expression prepended with a single asterisk. For the rest of this > > section, the other expressions in the list are called "mandatory". > > > > Note that this also refers to tuples in implicit assignment context, > > such as in a ``for`` statement. > > > > This designates a subexpression that will be assigned a list of all > > items from the iterable being unpacked that are not assigned to any > > of the mandatory expressions, or an empty list if there are no such > > items. > > > > It is an error (as it is currently) if the iterable doesn't contain > > enough items to assign to all the mandatory expressions. > > > > > > Implementation > > ============== > > > > The proposed implementation strategy is: > > > > - add a new grammar rule, ``star_test``, which consists of ``'*' > > test`` and is used in test lists > > - add a new ASDL type ``Starred`` to represent a starred expression > > - catch all cases where starred expressions are not allowed in the AST > > and symtable generation stage > > - add a new opcode, ``UNPACK_EX``, which will only be used if a > > list/tuple to be assigned to contains a starred expression > > - change ``unpack_iterable()`` in ceval.c to handle the extended > > unpacking case > > > > Note that the starred expression element introduced here is universal > > and could be used for other purposes in non-assignment context, such > > as the ``yield *iterable`` proposal. > > > > The author has written a draft implementation, but there are some open > > issues which will be resolved in case this PEP is looked upon > > benevolently. > > > > > > Open Issues > > =========== > > > > - Should the catch-all expression be assigned a list or a tuple of > items? > > > > > > References > > ========== > > > > None yet. > > > > > > Copyright > > ========= > > > > This document has been placed in the public domain. > > > > > > -- > > Thus spake the Lord: Thou shalt indent with four spaces. No more, no > less. > > Four shall be the number of spaces thou shalt indent, and the number of > thy > > indenting shall be four. Eight shalt thou not indent, nor either indent > thou > > two, excepting that thou then proceed to four. Tabs are right out. > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/nevillegrech%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070502/7efb8918/attachment.html From pje at telecommunity.com Wed May 2 02:30:20 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 20:30:20 -0400 Subject: [Python-3000] PEP Parade In-Reply-To: References: <5.1.1.6.0.20070501155833.043589f8@sparrow.telecommunity.com> <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> <5.1.1.6.0.20070501155833.043589f8@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070501195640.02e20ca0@sparrow.telecommunity.com> At 02:04 PM 5/1/2007 -0700, Guido van Rossum wrote: >On 5/1/07, Phillip J. Eby wrote: > > At 12:14 PM 5/1/2007 -0700, Guido van Rossum wrote: > > >Suppose you couldn't assign to __class__ of a function (that's too > > >messy to deal with in CPython) and you couldn't assign to its __code__ > > >either. What proposed functionality would you lose? > > > > The ability to overload any function, without having to track down all the > > places it's already been imported or otherwise saved, and change them to > > point to a new function or a non-function object. > >Frankly, I'm not sure this is worth all the proposed contortions. I'd >be happy (especially as long as this is a pure-Python thing) to have >to flag the base implementation explicitly with a decorator to make it >overloadable. That seems KISS to me. I can see that perspective; in fact my earlier libraries didn't have this feature. But later I realized that making only specially-marked functions amenable to overloading was rather like having classes that had to be specially marked in order to enable others to subclass them. It would mean that either you would obsessively mark every class in order to make sure that you or others would be able to extend it later, or you would have to sit and think on whether a given class would be meaningful for other users to subclass, since they wouldn't be able to change the status of a class without changing your source code. Either way, after using classes for a bit, it would make you wonder why classes shouldn't just be subclassable by default, to save all the effort and/or worry. Of course, I also understand that you aren't likely to consider overloads to be so ubiquitous as subclassing; however, in languages where merely static overloading exists, it tends to be used just that ubiquitously. And even C++, which requires you to declare subclass-overrideable methods as "virtual", does not require you to specifically declare which names will have overloads! But all that having been said, it appears that all of the current major Python implementations (CPython, Jython, IronPython, and PyPy) do in fact support assigning to func_code as long as the assigned value comes from another valid function object. So at the moment it certainly seems practical (if perhaps not pure!) to make use of this. Unless, of course, your intention is to make functions immutable in 3.x. But that would seem to put a damper on e.g. your recent "xreload" module, which makes use of __code__ assignment for exactly the purpose of redefining a function in-place. From lists at cheimes.de Wed May 2 02:32:35 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 02 May 2007 02:32:35 +0200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: Neville Grech Neville Grech wrote: > This reminds me a lot of haskell/prolog's head/tail list splitting. Looks > like a good feature. Agreed! > a*=range(5) > hmmn maybe in such a case, whenever there is the * operator, the resulting > item is always a list/tuple, like the following: > a=[[0,1,2,3,4]] ? Did you mean *a = range(5)? The result is too surprising for me. I would suspect that *a = range(5) has the same output as a = range(5). >>> *b = (1, 2, 3) >>> b (1, 2, 3) >>> a, *b = (1, 2, 3) >>> a, b 1, (2, 3) >>> *b, c = (1, 2, 3) >>> b, c (1, 2), 3 >>> a, *b, c = (1, 2, 3) >>> a, b, c 1, (2,), 3 But what would happen when the right side is too small? >>> a, *b, c = (1, 2) >>> a, b, c 1, (), 2 or should it raise an unpack exception? This should definitely raise an exception >>> a, *b, c, d = (1, 2) Christian From tjreedy at udel.edu Wed May 2 02:48:54 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 1 May 2007 20:48:54 -0400 Subject: [Python-3000] BList PEP References: Message-ID: "Daniel Stutzbach" wrote in message news:eae285400705010000l2af0e890ifc8c2e0de8219961 at mail.gmail.com... | Sort O(n log n) O(n log n) Tim Peters' list.sort is, I believe, better than nlogn for a number of practically important special cases. I believe he documented this in the code comments. Can you duplicate this with your structure? tjr From guido at python.org Wed May 2 02:52:16 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 17:52:16 -0700 Subject: [Python-3000] PEP Parade In-Reply-To: <5.1.1.6.0.20070501195640.02e20ca0@sparrow.telecommunity.com> References: <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> <5.1.1.6.0.20070501155833.043589f8@sparrow.telecommunity.com> <5.1.1.6.0.20070501195640.02e20ca0@sparrow.telecommunity.com> Message-ID: On 5/1/07, Phillip J. Eby wrote: > At 02:04 PM 5/1/2007 -0700, Guido van Rossum wrote: > >On 5/1/07, Phillip J. Eby wrote: > > > At 12:14 PM 5/1/2007 -0700, Guido van Rossum wrote: > > > >Suppose you couldn't assign to __class__ of a function (that's too > > > >messy to deal with in CPython) and you couldn't assign to its __code__ > > > >either. What proposed functionality would you lose? > > > > > > The ability to overload any function, without having to track down all the > > > places it's already been imported or otherwise saved, and change them to > > > point to a new function or a non-function object. > > > >Frankly, I'm not sure this is worth all the proposed contortions. I'd > >be happy (especially as long as this is a pure-Python thing) to have > >to flag the base implementation explicitly with a decorator to make it > >overloadable. That seems KISS to me. > > I can see that perspective; in fact my earlier libraries didn't have this > feature. But later I realized that making only specially-marked functions > amenable to overloading was rather like having classes that had to be > specially marked in order to enable others to subclass them. I admit I'm new to this game -- but most 3.0 users will be too. But I would be rather fearful of someone else stomping on a function I defined (and which I may be calling myself!) without my knowing it. ISTM (again admittedly from the fairly inexperienced perspective) that most functions and methods just *aren't* going to be useful as generic functions. The most likely initial use cases are situations where people sit down and specifically design an extensible framework with some seedling GFs and instructions for extending them. > It would mean that either you would obsessively mark every class in order > to make sure that you or others would be able to extend it later, or you > would have to sit and think on whether a given class would be meaningful > for other users to subclass, since they wouldn't be able to change the > status of a class without changing your source code. Either way, after > using classes for a bit, it would make you wonder why classes shouldn't > just be subclassable by default, to save all the effort and/or worry. Looking at it from a different way, you *do* have to mark APIs to be subclasses explicitly -- using the "class" syntax. You can leave that out, and then you end up with a bunch of functions in a module. Every time I write some code I make a conscious decision whether to do it as a class or as a method -- I don't create classes for everything by default. > Of course, I also understand that you aren't likely to consider overloads > to be so ubiquitous as subclassing; however, in languages where merely > static overloading exists, it tends to be used just that ubiquitously. And > even C++, which requires you to declare subclass-overrideable methods > as "virtual", does not require you to specifically declare which names > will have overloads! So this example can be interpreted both way -- sometimes you have to declare an anticipated use, sometimes you don't. It still hasn't convinced me that it's such a burden to have to declare GFs. I rather like the idea that it warns readers who are new to GFs and more familiar with how functions behave in Python 2. I can guarantee that very few people are aware of being able to assign to func_code (hey, *I* had to look it up! :-). > But all that having been said, it appears that all of the current major > Python implementations (CPython, Jython, IronPython, and PyPy) do in fact > support assigning to func_code as long as the assigned value comes from > another valid function object. So at the moment it certainly seems > practical (if perhaps not pure!) to make use of this. I see your PBP and I raise you an EIBTI. :-) > Unless, of course, your intention is to make functions immutable in > 3.x. But that would seem to put a damper on e.g. your recent "xreload" > module, which makes use of __code__ assignment for exactly the purpose of > redefining a function in-place. No plans in that direction. Just general discomfort with depending on the feature. Also noting that __code__ is an implementation detail -- it doesn't exist for other callables such as built-in functions. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Wed May 2 03:40:41 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 21:40:41 -0400 Subject: [Python-3000] PEP Parade In-Reply-To: References: <5.1.1.6.0.20070501195640.02e20ca0@sparrow.telecommunity.com> <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> <5.1.1.6.0.20070501155833.043589f8@sparrow.telecommunity.com> <5.1.1.6.0.20070501195640.02e20ca0@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070501212508.02f11d30@sparrow.telecommunity.com> At 05:52 PM 5/1/2007 -0700, Guido van Rossum wrote: >I rather like the idea that it warns readers who are new to GFs and more >familiar with how functions behave in Python 2. Until somebody adds an overload, it *does* behave the same; that was sort of the point. :) >Also noting that __code__ is an implementation detail -- >it doesn't exist for other callables such as built-in functions. Fair enough, although the PEP doesn't propose to allow extending built-in functions, only Python ones. >I would be rather fearful of someone else stomping on a function I defined >(and which I may be calling myself!) without my knowing it. All they can do is add special cases or wrappers to it; which is not quite the same thing. It's actually *safer* than monkeypatching, as you don't have to go out of your way to save the original version of the function, your method is only called when its condition applies, etc. For simple callbacks using before/after methods they needn't even remember to *call* the old function. However, since your objections are more in the nature of general unease than arguments against, it probably doesn't make sense for me to continue quibbling with them point by point, and instead focus on how to move forward. If you would like to require that the stdlib module use some sort of decorator (@overloadable, perhaps?) to explicitly mark a function as generic, that's probably fine, because the way it will work internally is that all the overloads still have to pass through a generic function... which I can then easily add an overload to in a separate library, which will then allow direct modification of existing functions, without needing a decorator. That way, we're both happy, and maybe by 3.1 you'll be comfortable with dropping the extra decorator. :) One possible issue, however, with this approach, is pydoc. In all three of my existing generic function libraries, I use function objects rather than custom objects, for the simple reason that pydoc won't document the signatures of anything else. On the other hand, I suppose there's no reason that the "make this overloadable" decorator couldn't just create another function object via compile or exec, whose implementation is fixed at creation time to do whatever lookup is required. From cvrebert at gmail.com Wed May 2 03:51:24 2007 From: cvrebert at gmail.com (Chris Rebert) Date: Tue, 1 May 2007 18:51:24 -0700 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: <47c890dc0705011851o273d1f16x5c4d9363c5e62822@mail.gmail.com> In the interest of furthering the discussion, here are two past threads on similar suggestions: [Python-Dev] Half-baked proposal: * (and **?) in assignments http://mail.python.org/pipermail/python-dev/2002-November/030349.html [Python-ideas] Javascript Destructuring Assignment http://mail.python.org/pipermail/python-ideas/2007-March/000284.html - Chris Rebert On 5/1/07, Christian Heimes wrote: > Neville Grech Neville Grech wrote: > > This reminds me a lot of haskell/prolog's head/tail list splitting. Looks > > like a good feature. > > Agreed! > > a*=range(5) > > hmmn maybe in such a case, whenever there is the * operator, the resulting > > item is always a list/tuple, like the following: > > a=[[0,1,2,3,4]] ? > > Did you mean *a = range(5)? > The result is too surprising for me. I would suspect that *a = range(5) > has the same output as a = range(5). > > >>> *b = (1, 2, 3) > >>> b > (1, 2, 3) > > >>> a, *b = (1, 2, 3) > >>> a, b > 1, (2, 3) > > >>> *b, c = (1, 2, 3) > >>> b, c > (1, 2), 3 > > > >>> a, *b, c = (1, 2, 3) > >>> a, b, c > 1, (2,), 3 > > But what would happen when the right side is too small? > >>> a, *b, c = (1, 2) > >>> a, b, c > 1, (), 2 > > or should it raise an unpack exception? > > This should definitely raise an exception > >>> a, *b, c, d = (1, 2) > > Christian > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/cvrebert%40gmail.com > From brett at python.org Wed May 2 04:02:11 2007 From: brett at python.org (Brett Cannon) Date: Tue, 1 May 2007 19:02:11 -0700 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: On 5/1/07, Guido van Rossum wrote: > > On 5/1/07, Georg Brandl wrote: > > This is a bit late, but it was in my queue by April 30, I swear! ;) > > Accepted. > > > Comments are appreciated, especially some phrasing sounds very clumsy > > to me, but I couldn't find a better one. > > > > Georg > > > > > > PEP: 3132 > > Title: Extended Iterable Unpacking > > Version: $Revision$ > > Last-Modified: $Date$ > > Author: Georg Brandl > > Status: Draft > > Type: Standards Track > > Content-Type: text/x-rst > > Created: 30-Apr-2007 > > Python-Version: 3.0 > > Post-History: > > > > > > Abstract > > ======== > > > > This PEP proposes a change to iterable unpacking syntax, allowing to > > specify a "catch-all" name which will be assigned a list of all items > > not assigned to a "regular" name. > > > > An example says more than a thousand words:: > > > > >>> a, *b, c = range(5) > > >>> a > > 0 > > >>> c > > 4 > > >>> b > > [1, 2, 3] > > Has it been pointed out to you already that this particular example is > hard to implement if the RHS is an iterator whose length is not known > a priori? The implementation would have to be quite hairy -- it would > have to assign everything to the list b until the iterator is > exhausted, and then pop a value from the end of the list and assign it > to c. it would be much easier if *b was only allowed at the end. (It > would be even worse if b were assigned a tuple instead of a list, as > per your open issues.) If a clean implementation solution cannot be found then I say go with the last-item-only restriction. You still get the nice functional language feature of car/cdr (or x:xs if you prefer ML or Haskell) without the implementation headache. I mean how often do you want the head and tail with everything in between left together? If I needed that kind of sequence control I would feed the iterator to a list comp and get to the items that way. Also, what should this do? Perhaps the grammar could disallow it? > > *a = range(5) I say disallow it. That is ambiguous as to what your intentions are even if you know what '*' does for multiple assignment. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070501/6af84286/attachment.htm From talin at acm.org Wed May 2 04:21:28 2007 From: talin at acm.org (Talin) Date: Tue, 01 May 2007 19:21:28 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> Message-ID: <4637F5A8.5000009@acm.org> Phillip J. Eby wrote: > At 09:13 AM 5/1/2007 -0700, Talin wrote: >> Phillip J. Eby wrote: >>> Proceeding to the "Next" Method >>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> If the first parameter of an overloaded function is named >>> ``__proceed__``, it will be passed a callable representing the next >>> most-specific method. For example, this code:: >>> def foo(bar:object, baz:object): >>> print "got objects!" >>> @overload >>> def foo(__proceed__, bar:int, baz:int): >>> print "got integers!" >>> return __proceed__(bar, baz) >> >> I don't care for the idea of testing against a specially named >> argument. Why couldn't you just have a different decorator, such as >> "overload_chained" which triggers this behavior? > > The PEP lists *five* built-in decorators, all of which support this > behavior:: > > @overload, @when, @before, @after, @around > > And in addition, it demonstrates how to create *new* method combination > decorators, that *also* support this behavior (e.g. '@discount'). > > All in all, there are an unbounded number of possible decorators that > would require chained and non-chained variations. Well, I suppose you could make "chained" a modifier of the decorator, so for example @operator.chained, @discount.chained, and so on. In other words, the decorator can be called directly, or the attribute 'chained' also produces a callable that causes the modified behavior. Moreover, this would support an arbitrary number of modifiers on the decorator, such as @overload.chained.strict(True).whatever. -- Talin From greg.ewing at canterbury.ac.nz Wed May 2 04:23:02 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 02 May 2007 14:23:02 +1200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <4637631A.6030702@v.loewis.de> References: <46371BD2.7050303@v.loewis.de> <4637631A.6030702@v.loewis.de> Message-ID: <4637F606.4060707@canterbury.ac.nz> Martin v. L?wis wrote: > http://mail.python.org/pipermail/python-3000/2006-April/001526.html > > where Guido states that he trusts me that it can be made to work, > and that "eventually" it needs to be supported. He says "the tools aren't ready yet", which I take to mean that Python won't need to support it until all widely-used editors, email and news software, etc, etc, reliably support displaying and editing of all unicode characters. We're clearly a long way from that situation. -- Greg From guido at python.org Wed May 2 04:37:55 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 19:37:55 -0700 Subject: [Python-3000] PEP Parade In-Reply-To: <5.1.1.6.0.20070501212508.02f11d30@sparrow.telecommunity.com> References: <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> <5.1.1.6.0.20070501155833.043589f8@sparrow.telecommunity.com> <5.1.1.6.0.20070501195640.02e20ca0@sparrow.telecommunity.com> <5.1.1.6.0.20070501212508.02f11d30@sparrow.telecommunity.com> Message-ID: On 5/1/07, Phillip J. Eby wrote: > However, since your objections are more in the nature of general unease > than arguments against, it probably doesn't make sense for me to continue > quibbling with them point by point, and instead focus on how to move forward. Thanks for indulging my insecurities. > If you would like to require that the stdlib module use some sort of > decorator (@overloadable, perhaps?) to explicitly mark a function as > generic, that's probably fine, because the way it will work internally is > that all the overloads still have to pass through a generic > function... which I can then easily add an overload to in a separate > library, which will then allow direct modification of existing functions, > without needing a decorator. That way, we're both happy, and maybe by 3.1 > you'll be comfortable with dropping the extra decorator. :) I'll take my cue from the users. > One possible issue, however, with this approach, is pydoc. In all three of > my existing generic function libraries, I use function objects rather than > custom objects, for the simple reason that pydoc won't document the > signatures of anything else. On the other hand, I suppose there's no > reason that the "make this overloadable" decorator couldn't just create > another function object via compile or exec, whose implementation is fixed > at creation time to do whatever lookup is required. That's one solution. Another solution would be to use GFs in Pydoc to make it overloadable; I'd say pydoc could use a bit of an overhault at this point. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed May 2 04:39:45 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 1 May 2007 19:39:45 -0700 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: On 5/1/07, Brett Cannon wrote: > > Also, what should this do? Perhaps the grammar could disallow it? > > > > *a = range(5) > > I say disallow it. That is ambiguous as to what your intentions are even if > you know what '*' does for multiple assignment. My real point was that the PEP lacks precision here. It should list the exact proposed changes to Grammar/Grammar. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Wed May 2 04:38:57 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 02 May 2007 14:38:57 +1200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <46376752.2070007@v.loewis.de> References: <46371BD2.7050303@v.loewis.de> <43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com> <46376752.2070007@v.loewis.de> Message-ID: <4637F9C1.2060303@canterbury.ac.nz> Martin v. L?wis wrote: > I still don't understand why the "no operation" statement is called > "pass" - it's not the opposite of "fail", and seems to have no > relationship to "can you pass me the butter, please?". It's "pass" as in "pass through", i.e. move on to the next statement without stopping to do anything. Also there's an idiom that you hear in a setting such as a quiz show, where a contestant will say "Pass", meaning "I don't know the answer to that, give me the next question." -- Greg From jason.orendorff at gmail.com Wed May 2 04:43:22 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Tue, 1 May 2007 22:43:22 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070501143840.02ca7760@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <46376729.9000008@acm.org> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070501143840.02ca7760@sparrow.telecommunity.com> Message-ID: On 5/1/07, Phillip J. Eby wrote: > At 02:22 PM 5/1/2007 -0400, Jason Orendorff wrote: > >I think I would prefer to *always* pass the next method > >to @around methods, which always need it, and *never* > >pass it to any of the others. What use case am I missing? > > Calling the next method in a generic function is equivalent to calling > super() in a normal method. Anytime you want to add more specific behavior > for a type, while reusing the more general behavior, you're going to need > it. [...] Oh, I see. I thought @before, @after, and @around should cover all the use cases. But timing is not the only difference between them. It also affects how you affect other people's advice and how later, more specific advice will affect you. In short, you have to ask yourself: am I hooking something (before/after), implementing it (when), or just generally looking for trouble (around)? I haven't used CLOS or Aspect-J, but I have played Magic: the Gathering, which judging by these examples is largely the same thing. Incidentally, Magic gets by with just @around (which they spell "instead of") and @after (which they spell "when"). Come to think of it, Inform 7 is the other system I know of that has an advice system like this. Now I'm suspicious. Are you trying to turn Python into some kind of game? I forgot to say earlier: Thanks very much for writing this PEP. This should be interesting. > The other possibility would be to clone the functions using copied > func_globals (__globals__?) so that 'next_method' in those namespaces would > point to the right next method. But then, if the function *writes* any > globals, it'll be updating the wrong namespace. Do you have any other ideas? Here's what I've got left. Take your pick: @when(bisect.bisect, withNextMethod=True) def bisect_bee(nextMethod, seq : Sequence, eric : Bee, *options): ... ..in which case @override would be left out in the cold, but I'm okay with that. Or else: @override @withNextMethod def bisect(nextMethod, ...): ... Your idea of using the argument annotation was fine, too. Any of these three is better than detecting the argument name. -j From pje at telecommunity.com Wed May 2 04:47:07 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 22:47:07 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <4637F5A8.5000009@acm.org> References: <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070501223519.029ca628@sparrow.telecommunity.com> At 07:21 PM 5/1/2007 -0700, Talin wrote: >Well, I suppose you could make "chained" a modifier of the decorator, so >for example @operator.chained, @discount.chained, and so on. In other >words, the decorator can be called directly, or the attribute 'chained' >also produces a callable that causes the modified behavior. Well, that certainly seems like enough of an option to list as an alternative in the PEP, but I personally think it increases implementation complexity, compared to the way things work now. One reason for that is that right now the decorators don't actually *do* much of anything, as per this excerpt from peak.rules.core:: def decorate(f, pred=()): rules = rules_for(f) def callback(frame, name, func, old_locals): rule = parse_rule( rules, func, pred, maker, frame.f_locals, frame.f_globals ) rules.add(rule) if old_locals.get(name) in (f, rules): return f # prevent overwriting if name is the same return func return decorate_assignment(callback) The above is the function used for *all* of the decorators proposed in the PEP, except for @overload. The only bit that differs between them is the ``maker``, which is a classmethod of the corresponding Method class (e.g. Method, Before, Around, etc.). The maker is used to create action instances, which are then combined into chains using combine_actions(). The signature stuff for __proceed__ (which is actually called 'next_method' in peak.rules) is done inside the ``maker`` and the action instance itself, not in the decorator. So, it would require a fair amount of refactoring and additional complexity to do it the way you suggest. It's intriguing, but I'm not sure it's a big win compared to e.g. "def foo(next:next_method, ...)". I *could* see allowing the next_method to be in a different position, since the partial() can still be precomputed, and bound methods could still be used in the case where it was in the first position. > Moreover, this would support an arbitrary number of modifiers on the > decorator, such as @overload.chained.strict(True).whatever. Actually, I don't think Python's grammar allows you to do that. IIRC, decorators have to be a dotted name followed by an optional (arglist). So the '.whatever' part wouldn't be legal. From talin at acm.org Wed May 2 04:50:11 2007 From: talin at acm.org (Talin) Date: Tue, 01 May 2007 19:50:11 -0700 Subject: [Python-3000] PEP Parade In-Reply-To: References: Message-ID: <4637FC63.7070301@acm.org> Guido van Rossum wrote: > S 3125 Remove Backslash Continuation Jewett > > Sounds reasonable. I think we should still support \ inside string > literals though; the PEP isn't clear on this. I hope this falls within > the scope of the refactoring tool (sandbox/2to3). I'm a strong -1 on this one BTW. I really dislike the idea of having to add spurious parentheses or other grouping operators in order to force line continuation. It requires the Python programmer to replace an ugly lexical-level hack into an ugly and cluttered parsing-level hack. Readability suffers as a consequence. In general, parens or grouping operators should only be used when they *mean* something, not merely as a hint to the parser as to how to parse something. -- Talin From greg.ewing at canterbury.ac.nz Wed May 2 04:50:12 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 02 May 2007 14:50:12 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> Message-ID: <4637FC64.4050701@canterbury.ac.nz> Phillip J. Eby wrote: > The PEP lists *five* built-in decorators, all of which support this behavior:: > > @overload, @when, @before, @after, @around This seems massively over-designed. All you need is the ability to call the next method, and you can get all of these behaviours. If you call it first, then you get after behaviour; if you call it last, you get before behaviour; etc. -- Greg From talin at acm.org Wed May 2 05:10:53 2007 From: talin at acm.org (Talin) Date: Tue, 01 May 2007 20:10:53 -0700 Subject: [Python-3000] Some canonical use-cases for ABCs/Interfaces/Generics Message-ID: <4638013D.8090902@acm.org> One of my concerns in the ABC/interface discussion so far is that a lot of the use cases presented are "toy" examples. This makes perfect sense considering that you don't want to have to spend several pages explaining the use case. But at the same time, it means that we might be solving problems that aren't real, while ignoring problems that are. What I'd like to do is collect a set of "real-world" use cases and document them. The idea would be that we could refer to these use cases during the discussion, using a common terminology and shorthand examples. I'll present one very broad use case here, and I'd be interested if people have ideas for other use cases. The goal is to define a small number of broadly-defined cases that provide a broad coverage of the problem space. ==== The use case I will describe is what I will call "Object Graph Transformation". The general pattern is that you have a collection of objects organized in a graph which you wish to transform. The objects in the graph may be standard Python built-in types (lists, tuples, dicts, numbers), or they may be specialized application-specific types. The Python "pickle" operation is an example of this type of transformation: Converting a graph of objects into a flat stream format that can later be reconstituted back into a graph. Other kinds of transformations would include: Serialization: pickling, marshaling, conversion to XML or JSON, ORMs and other persistence frameworks, migration of objects between runtime environments or languages, etc. Presentation: Conversion of a graph of objects to a visible form, such as a web page. Interactive Editing: The graph is converted to a user editable form, a la JavaBeans. An example is an user-interface editor application which allows widgets to be edited via a property sheet. The object graph is displayed in a live "preview" window, while a "tree view" of object properties is shown in a side panel. The transformation occurs when the objects in the graph are transformed into a hierarchy of key/value properties that are displayed in the tree view window. These various cases may seem different but they all have a similar structure in terms of the roles of the participants involved. For a given transformation, there are 4 roles involved: 1) The author of the objects to be transformed. 2) The author of the generic transform function, such as "serialize". 3) The author of the special transform function for each specific class. 4) The person invoking the transform operation within the application. We can give names to these various bits of code if we wish, such as the "Operand", the "General Operator", the "Special Operator", and the "Invocation". But for now, I'll simply refer to them by number. Using the terminology of generic functions, (1) is the author of the argument that is passed to the generic function, (2) is the author of the original "generic" function, (3) is the author of the overloads of the generic function, and (4) is the person calling the generic function. Each of these authors may have participated at different times and may be unaware of each other's work. The only dependencies is that (3) must know about (1) and (2), and (4) must know about (2). Note that if any of these participants *do* have prior knowledge of the others, then the need for a generic adaption framework is considerably weakened. So for example, if (2) already knows all of the types of objects that are going to be operated on, then it can simply hard-code that knowledge into its own implementation. Similarly, if (1) knows that it is going to be operated on in this way, then it can simply add a method to do that operation. Its only when the system needs to be N-way extensible, where N is the number of participants, that a more general dispatch solution is required. A real-world example of this use case is the TurboGears/TurboJSON conversion of Python objects into JSON format, which currently uses RuleDispatch to do the heavy lifting. @jsonify.when(Person) def jsonify_person(obj): # Code to convert a Person object to a dict # of properties which can be serialized as JSON In this example, the "Person" need never know anything about JSON formatting, and conversely the JSON serialization framework need know nothing about Person objects. Instead, this little adaptor function is the glue that ties them together. This also means that built-in types can be serialized under the new system without having to modify them. Otherwise, you would either have to build into the serializer special-case knowledge of these types, or you would have to restrict your object graph to using only special application-specific container and value types. Thus, a list of Person objects can be a plain list, but can still be serialized using the same persistence framework as is used for the Person object. === OK that is the description of the use case. I'd be interested to know what uses cases people have that fall *outside* of the above. -- Talin From daniel at stutzbachenterprises.com Wed May 2 05:17:04 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Tue, 1 May 2007 22:17:04 -0500 Subject: [Python-3000] BList PEP In-Reply-To: References: Message-ID: On 5/1/07, Terry Reedy wrote: > "Daniel Stutzbach" wrote in message > news:eae285400705010000l2af0e890ifc8c2e0de8219961 at mail.gmail.com... > | Sort O(n log n) O(n log n) > > Tim Peters' list.sort is, I believe, better than nlogn for a number of > practically important special cases. I believe he documented this in the > code comments. Can you duplicate this with your structure? The table in the PEP lists worst-case execution times. I'll make that explicit in the next revision. You are correct that TimSort is O(n) for nearly-sorted lists. It's possible to implement TimSort over the BList, but I have not yet done so. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From talin at acm.org Wed May 2 05:38:31 2007 From: talin at acm.org (Talin) Date: Tue, 01 May 2007 20:38:31 -0700 Subject: [Python-3000] Another way to understand static metaprogramming in functional languages Message-ID: <463807B7.2000308@acm.org> Guido was complaining to me today, something along the lines that every time someone presents him with an example of Haskell code, his eyes start glazing over. I have pretty much the same problem, even though I've actually taken the time to read a little bit about Haskell. If you are someone who is interested in how the magic of static metaprogramming in functional languages can work, but find Haskell code hard to read, then I strongly recommend Graydon Hoare's "One Day Compilers" presentation: http://www.venge.net/graydon/talks/mkc/html/mgp00001.html This is a slideshow that shows how to build a simple compiler in one day in OCAML. The slideshow is easy to read, and covers a brief introduction to OCAML (which is similar to Haskell in spirit) as well as details of how to construct the compiler. There's lots of stuff in there on how to use manipulation of types to get things done fast. -- Talin From tjreedy at udel.edu Wed May 2 05:42:19 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 1 May 2007 23:42:19 -0400 Subject: [Python-3000] Derivation of "pass" in Python (was Re: PEP: Supporting Non-ASCII Identifiers) References: <43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com><46371BD2.7050303@v.loewis.de><43aa6ff70705010844u4a6333f5hf1d4d3a807361ffe@mail.gmail.com><5.1.1.6.0.20070501123738.05263610@sparrow.telecommunity.com> <463770D9.3050405@v.loewis.de> Message-ID: ""Martin v. L?wis"" wrote in message news:463770D9.3050405 at v.loewis.de... |> Thus, when someone is offered something, they may say, "I'll pass", | > meaning they are declining to act. Ergo, to "pass" in Python is to | > decline to give up the opportunity to act. The person being quoted meant "to decline, to give up...". The missing comma inverts the meaning. | | Ah, ok. It would then be similar to "Passe!" in German, which is | used in card games, if you don't play a card, but instead hand | over to the next player. Even though this is clearly the same | ancestry, it never occurred to me that the same meaning is also | present in English (also, "passen" is somewhat oldish now, so | I don't use it actively myself). In the card game bridge, for instance, 'pass' is the official word for 'no bid'. Anything else meaning the same thing is illegal. So 'pass', among other things, can either mean 'not fail' or 'fail to act' ;-) tjr From pje at telecommunity.com Wed May 2 05:54:20 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 23:54:20 -0400 Subject: [Python-3000] PEP Parade In-Reply-To: References: <5.1.1.6.0.20070501212508.02f11d30@sparrow.telecommunity.com> <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> <5.1.1.6.0.20070501155833.043589f8@sparrow.telecommunity.com> <5.1.1.6.0.20070501195640.02e20ca0@sparrow.telecommunity.com> <5.1.1.6.0.20070501212508.02f11d30@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070501234608.02a589f8@sparrow.telecommunity.com> At 07:37 PM 5/1/2007 -0700, Guido van Rossum wrote: >On 5/1/07, Phillip J. Eby wrote: > > However, since your objections are more in the nature of general unease > > than arguments against, it probably doesn't make sense for me to continue > > quibbling with them point by point, and instead focus on how to move > forward. > >Thanks for indulging my insecurities. I hope that didn't come across as patronizing; I didn't mean to say that your arguments weren't valid, just that it seemed unlikely your position would be swayed solely by argument, and that thus it would be better not to keep arguing with you about them. >That's one solution. Another solution would be to use GFs in Pydoc to >make it overloadable; I'd say pydoc could use a bit of an overhault at >this point. True enough; until you mentioned that, I'd forgotten that a week or two ago I got an email from somebody working on the pydoc overhaul who mentioned that he had had to work up an ad-hoc generic function implementation for just that reason. :) From pje at telecommunity.com Wed May 2 05:56:39 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 01 May 2007 23:56:39 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070501143840.02ca7760@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <46376729.9000008@acm.org> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070501143840.02ca7760@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070501235431.043df428@sparrow.telecommunity.com> At 10:43 PM 5/1/2007 -0400, Jason Orendorff wrote: >In short, you have to ask >yourself: am I hooking something (before/after), implementing it >(when), or just generally looking for trouble (around)? Nice summary! I'll add something like this to the PEP, although I suppose I'll have to make the language a bit more formal. :) From pje at telecommunity.com Wed May 2 06:04:35 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 02 May 2007 00:04:35 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <4637FC64.4050701@canterbury.ac.nz> References: <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> At 02:50 PM 5/2/2007 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > > The PEP lists *five* built-in decorators, all of which support this > behavior:: > > > > @overload, @when, @before, @after, @around > >This seems massively over-designed. All you need is the >ability to call the next method, and you can get all of >these behaviours. If you call it first, then you get >after behaviour; if you call it last, you get before >behaviour; etc. Yep, that was my theory too, until I actually used generic functions. As it happens, it's: 1) a lot more pleasant not to write the extra boilerplate all the time, and 2) having @before or @after tells you right away the intent of the method, without having to carefully inspect the body to see when and whether it is calling the next method, and whether it is modifying the arguments or return values in some way. In other words, the restricted behavior of @before and @after methods makes them easier to write *and* easier to read. By the way, if you look at the PEP, you'll find motivating examples for each of the decorators, as well as an explanation and examples of when and how you might want to create even *more* such decorators. IIRC, CLOS has about *8 more* kinds of method combinators that come standard, including ones that we'd probably spell something like @sum, @product, @min, @max, @list, @any, and @all, if it weren't for most of those names already being builtins that mean something else. :) The PEP doesn't propose implementing all of those, but it does show how easily you can create things like that if you want to. From pje at telecommunity.com Wed May 2 06:29:06 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 02 May 2007 00:29:06 -0400 Subject: [Python-3000] Some canonical use-cases for ABCs/Interfaces/Generics In-Reply-To: <4638013D.8090902@acm.org> Message-ID: <5.1.1.6.0.20070502000659.044c03c8@sparrow.telecommunity.com> Thanks for writing this! At 08:10 PM 5/1/2007 -0700, Talin wrote: >Other kinds of transformations would include: Compilation is a collection of such transforms over an AST, by the way. Documentation generators are transforms over either an AST (source-based doc generators) or a set of modules' contents (e.g. epydoc and pydoc). Zope 3 transforms object trees into views. It also uses something called "event adapters" that are basically a crude sort of before/after/around method system. Twisted and Zope also do a lot of adaptation, which is sort of a poor man's generic function combined with namespaces. >Note that if any of these participants *do* have prior knowledge of the >others, then the need for a generic adaption framework is considerably >weakened. So for example, if (2) already knows all of the types of >objects that are going to be operated on, then it can simply hard-code >that knowledge into its own implementation. One of the goals of PEP 3124, btw, is to encourage people to use overloads even in the case where they *think* that they know all the types to be operated on, because there is always the chance that somebody else will come along and want to reuse that code. Pydoc and epydoc are good examples of a situation where author #2 thought they knew all the things to be operated on. >Similarly, if (1) knows that >it is going to be operated on in this way, then it can simply add a >method to do that operation. Its only when the system needs to be N-way >extensible, where N is the number of participants, that a more general >dispatch solution is required. This makes it sound more complex than it is; all that is required is for one person to try to use two other people's code -- neither of whom anticipated the combination. This situation is very easy to come by, but from your description each of persons 1 through 4 might conclude that since they normally work alone, this won't affect them. ;) >OK that is the description of the use case. I'd be interested to know >what uses cases people have that fall *outside* of the above. Well, consider what other things generic functions are used for in ordinary Python, like len(), iter(), sum(), all the operator.* functions, etc. Any operation that might be performed on a variety of types is applicable. Of course, these generic functions built in to Python use __special__ methods rather than having a registry; but this is an implementation detail. They are simply generic functions that don't let you register methods for built-in types (since they're immutable and you thus can't add the __special__ methods). So, generic functions that allow registration are just a generalization of what we already have so that: 1. there are no namespace collisions between authors competing for reserved method names 2. dispatch can be on any number of arguments 3. any type can play in any function, even the type is built-in Note that if it sounds like I'm saying that all functions are potentially generic, it's because they are. Heck, they *already* are, in Python. It's just that we don't have a uniform way of expressing them, as opposed to an ad-hoc assortment of patterns. From fdrake at acm.org Wed May 2 06:31:43 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 2 May 2007 00:31:43 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070501143840.02ca7760@sparrow.telecommunity.com> Message-ID: <200705020031.44147.fdrake@acm.org> On Tuesday 01 May 2007, Jason Orendorff wrote: > Come to think of it, Inform 7 is the other system I know of that has > an advice system like this. Now I'm suspicious. Are you trying to > turn Python into some kind of game? Software is always a game, and I've been beginning to think the spoils of the victor always involve large amounts of pain. For the loser, at least the pain ends. I wonder if today is one of my cynical days? --sigh-- -Fred -- Fred L. Drake, Jr. From santagada at gmail.com Wed May 2 06:32:36 2007 From: santagada at gmail.com (Leonardo Santagada) Date: Wed, 2 May 2007 01:32:36 -0300 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <46377B60.1030501@v.loewis.de> References: <435DF58A933BA74397B42CDEB8145A860B745DEF@ex9.hostedexchange.local> <46377B60.1030501@v.loewis.de> Message-ID: <8C502782-3E6F-4613-BB8D-5BB827FB7238@gmail.com> I don't know if I can speak on the py3k list, but I would give this a -1. Supporting non-ascii identifiers don't fix the bigger problem. People want to write programs in their own language. Not only identifiers, but all of the literals on the sintax of python would be better if they can be on the programmers language, as what the guys from OLPC want. I think we should defer this pep and try to come with a broader solution that can work as a diferent dialect of python... something using the python VM but with a completely different parser. Having a parser that reads unicode as guido recently suggested is the first step. Then you could have something like encoding but called language where you set your dialect of python (maybe this can be set per account in a system), and for the last part you will need some files that translate the stdlibrary and anyother library so you can do stuff like this: #!/usr/bin/env python # _*_ encoding: utf-8 # _*_ lang: Portuguese minha_vari?vel = 2 para contador em faixa(10): se contador % minha_vari?vel == 0: imprime "oi mundo" this is useful to teach programming to really young kids and in places where english is really not common... but the thing is that just having non-ascii identifiers is not going to solve your problem. -- Leonardo Santagada santagada at gmail.com From krstic at solarsail.hcs.harvard.edu Wed May 2 06:42:17 2007 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?B?SXZhbiBLcnN0acSH?=) Date: Wed, 02 May 2007 00:42:17 -0400 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <8C502782-3E6F-4613-BB8D-5BB827FB7238@gmail.com> References: <435DF58A933BA74397B42CDEB8145A860B745DEF@ex9.hostedexchange.local> <46377B60.1030501@v.loewis.de> <8C502782-3E6F-4613-BB8D-5BB827FB7238@gmail.com> Message-ID: <463816A9.8070704@solarsail.hcs.harvard.edu> Leonardo Santagada wrote: > but all of the literals on the sintax of python would be better if > they can be on the programmers language, as what the guys from OLPC > want. It's not clear to me that that's what we want, actually. I think Alan Kay mentioned that they can do this level of i18n with Squeak already, and that will probably do quite well for the really young kids. For the rest, I think a single set of language keywords is generally a Very Good Thing. Guido and others have justified not wanting to add more syntax-level metaprogramming abilities to Python by saying that it's important for all Python code to read as Python code, not as "Python code that might sometimes mean something entirely different because of macros". Keyword translation would cause a similarly ugly problem, I suspect. -- Ivan Krsti? | GPG: 0x147C722D From tjreedy at udel.edu Wed May 2 06:46:50 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 2 May 2007 00:46:50 -0400 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers References: <46371BD2.7050303@v.loewis.de> Message-ID: "Giovanni Bajo" wrote in message news:f178kn$s0n$2 at sea.gmane.org... On 01/05/2007 12.52, Martin v. L?wis wrote: | Isn't this already blacklisted in PEP 3099? In today's Pep Parade post, he implies no. "PEP: Supporting Non-ASCII identifiers (Martin von Loewis) I'm on record as not liking this; my worry is that it will become a barrier to the free exchange of code. It's not just languages I can't read (Russian transliterated to the latin alphabet would be just as bad and we don't stop that now); many text editors have no or limited support for other scripts (not to mention mixing right-to-left script with Python's left-to-right identifiers). But if this receives a lot of popular support I'm willing to give it a try. The One Laptop Per Child project for example would like to enable students to code in their own language (of course they'd rather see the language keywords and standard library translated too...)." OLPC, which is one realization of Guido's CP4E dream (computer programming for everyone), changes the ball game (to use an American expression). I expect that most anyone with a college education from anywhere in the world has been exposed to latin characters and at least a few English words. But the case is different, I think, for elementary kids. Given that Guido has given the language and implementation freely and for free, I think it reasonable that he want to be able to read programs that recipients write. And 'foreign' words are much easier for him and many of us to read, match, and differentiate* when transliterated to Latin chars than when written in one of the ever proliferating character sets. (And I believe that Unicode is, sadly, encouraging the invention of unneeded new sets for obscure languages that would be much better off using one of the existing writing systems.) * To understand a program, one must be able to match all occurences of the same identifier and differentiate different identifiers. So, Martin, I suggest that you expand your proposal to include a transliteration mechanism and limit the allowed characters to those which can be translitered. I presume that this would be an expanding set. Once a mechanism is in place, people who want 'their' character set included can do the work needed for that set. Terry Jan Reedy From martin at v.loewis.de Wed May 2 07:05:21 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 02 May 2007 07:05:21 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <4637F606.4060707@canterbury.ac.nz> References: <46371BD2.7050303@v.loewis.de> <4637631A.6030702@v.loewis.de> <4637F606.4060707@canterbury.ac.nz> Message-ID: <46381C11.5030104@v.loewis.de> > He says "the tools aren't ready yet", which I take to > mean that Python won't need to support it until all > widely-used editors, email and news software, etc, etc, > reliably support displaying and editing of all > unicode characters. We're clearly a long way from > that situation. I don't understand that requirement. Clearly, editors do support non-ASCII characters already for many years (atleast since 1980, maybe longer). Is the complaint that a single editor does not support all characters? I don't see a need for that - the editor will present a replacement character. However, if somebody bothered entering the character in a source file, there is actually a high chance that an editor can display it (how else did he enter the character?) Or is the complaint that editors don't support UTF-8? That is simply not true anymore. E.g. IDLE has supported editing UTF-8 for several Python releases now. Regards, Martin From martin at v.loewis.de Wed May 2 07:10:35 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 02 May 2007 07:10:35 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: References: <46371BD2.7050303@v.loewis.de> Message-ID: <46381D4B.2010802@v.loewis.de> > So, Martin, I suggest that you expand your proposal to include a > transliteration mechanism and limit the allowed characters to those which > can be translitered. I presume that this would be an expanding set. Once > a mechanism is in place, people who want 'their' character set included can > do the work needed for that set. I can certainly add that as a request, but I'm -1 on it. There shouldn't be two different spellings for the same identifier, plus transliteration systems often depend on the natural language (e.g. ? is transliterated as oe in German, but (I believe) just as o in the Skandinavian languages that have that character). Regards, Martin From g.brandl at gmx.net Wed May 2 09:02:18 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 02 May 2007 09:02:18 +0200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: Guido van Rossum schrieb: > On 5/1/07, Brett Cannon wrote: >> > Also, what should this do? Perhaps the grammar could disallow it? >> > >> > *a = range(5) >> >> I say disallow it. That is ambiguous as to what your intentions are even if >> you know what '*' does for multiple assignment. > > My real point was that the PEP lacks precision here. It should list > the exact proposed changes to Grammar/Grammar. You're right. I tried to imply this with "A tuple (or list) on the left side of a simple assignment", but it isn't clear enough. I'll update the PEP to incorporate the grammar changes today. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Wed May 2 09:04:29 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 02 May 2007 09:04:29 +0200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: Guido van Rossum schrieb: > On 5/1/07, Georg Brandl wrote: >> This is a bit late, but it was in my queue by April 30, I swear! ;) > > Accepted. > >> Comments are appreciated, especially some phrasing sounds very clumsy >> to me, but I couldn't find a better one. >> >> Georg >> >> >> PEP: 3132 >> Title: Extended Iterable Unpacking >> Version: $Revision$ >> Last-Modified: $Date$ >> Author: Georg Brandl >> Status: Draft >> Type: Standards Track >> Content-Type: text/x-rst >> Created: 30-Apr-2007 >> Python-Version: 3.0 >> Post-History: >> >> >> Abstract >> ======== >> >> This PEP proposes a change to iterable unpacking syntax, allowing to >> specify a "catch-all" name which will be assigned a list of all items >> not assigned to a "regular" name. >> >> An example says more than a thousand words:: >> >> >>> a, *b, c = range(5) >> >>> a >> 0 >> >>> c >> 4 >> >>> b >> [1, 2, 3] > > Has it been pointed out to you already that this particular example is > hard to implement if the RHS is an iterator whose length is not known > a priori? The implementation would have to be quite hairy -- it would > have to assign everything to the list b until the iterator is > exhausted, and then pop a value from the end of the list and assign it > to c. Yes, that is correct. My implementation isn't *that* hairy, though, it's only 13 lines of code more. I'll post the patch to SourceForge later today. > it would be much easier if *b was only allowed at the end. (It > would be even worse if b were assigned a tuple instead of a list, as > per your open issues.) The created tuple is a fresh one, so can't I just copy pointers like from a list and set ob_size later? > Also, what should this do? Perhaps the grammar could disallow it? > > *a = range(5) I'm not so sure about the grammar, I'm currently catching it in the AST generation stage. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From greg.ewing at canterbury.ac.nz Wed May 2 09:15:17 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 02 May 2007 19:15:17 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <46376729.9000008@acm.org> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070501143840.02ca7760@sparrow.telecommunity.com> Message-ID: <46383A85.1020003@canterbury.ac.nz> Jason Orendorff wrote: > Now I'm suspicious. Are you trying to > turn Python into some kind of game? You mean it isn't already? I've always felt that writing Python code is more like fun than work... -- Greg From greg.ewing at canterbury.ac.nz Wed May 2 09:48:23 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 02 May 2007 19:48:23 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> References: <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> Message-ID: <46384247.8020601@canterbury.ac.nz> Phillip J. Eby wrote: > Yep, that was my theory too, until I actually used generic functions. Is there something about generic functions that makes them different from methods in this regard? I've used OO systems which have the equivalent of @before, @after etc. for overriding methods, and others (including Python) which don't, and I've never found myself missing them. So I'm skeptical that they're a must-have feature for generic functions. > 1) a lot more pleasant not to write the extra boilerplate all the time, I'd work on that by finding ways to reduce the boilerplate. Calling the next method of a generic function shouldn't be any harder than calling the inherited implementation of a normal method. > By the way, if you look at the PEP, you'll find motivating examples for > each of the decorators, There are examples, yes, but they don't come across as very compelling as to why there should be so many variations of the overloading decorator rather than a single general one. > IIRC, CLOS has about *8 more* kinds of method combinators CLOS strikes me as being the union of all Lisp dialects that anyone has ever used, rather than something with a coherent design behind it. So quoting CLOS is not going to make me think better of anything. -- Greg From eric+python-dev at trueblade.com Wed May 2 14:26:42 2007 From: eric+python-dev at trueblade.com (Eric V. Smith) Date: Wed, 02 May 2007 08:26:42 -0400 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <46371BD2.7050303@v.loewis.de> References: <46371BD2.7050303@v.loewis.de> Message-ID: <46388382.90800@trueblade.com> Martin v. L?wis wrote: ... > Specification of Language Changes > ================================= > > The syntax of identifiers in Python will be based on the Unicode > standard annex UAX-31 [1]_, with elaboration and changes as defined > below. > > Within the ASCII range (U+0001..U+007F), the valid characters for > identifiers are the same as in Python 2.5. This specification only > introduces additional characters from outside the ASCII range. For > other characters, the classification uses the version of the Unicode > Character Database as included in the unicodedata module. > > The identifier syntax is \*. > > ID_Start is defined as all characters having one of the general > categories uppercase letters (Lu), lowercase letters (Ll), titlecase > letters (Lt), modifier letters (Lm), other letters (Lo), letter > numbers (Nl), plus the underscore (XXX what are "stability extensions > listed in UAX 31). > > ID_Continue is defined as all characters in ID_Start, plus nonspacing > marks (Mn), spacing combining marks (Mc), decimal number (Nd), and > connector punctuations (Pc). > > All identifiers are converted into the normal form NFC while parsing; > comparison of identifiers is based on NFC. Martin: I don't understand Unicode nearly well enough to really comment on this, but could you add a comment that the PEP3101 code might need to be adjusted to deal with Unicode identifiers? I don't actually think your PEP would make any difference to how we're parsing, because we don't have a "is this a valid character for an identifier" function. But I'd like to get a note somewhere in the PEP saying that all code that parses for identifiers might be impacted. The PEP 3101 code is one place where we have such a parser. We'd at least need to implement tests for Unicode identifiers. Which reminds me that we need better tests for the existing PEP 3101 code, especially for strings with surrogate pairs. I'll look at beefing that up. Thanks. Eric. From rrr at ronadam.com Wed May 2 15:24:58 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 02 May 2007 08:24:58 -0500 Subject: [Python-3000] PEP Parade In-Reply-To: <5.1.1.6.0.20070501234608.02a589f8@sparrow.telecommunity.com> References: <5.1.1.6.0.20070501212508.02f11d30@sparrow.telecommunity.com> <5.1.1.6.0.20070501145217.04d293e8@sparrow.telecommunity.com> <5.1.1.6.0.20070501155833.043589f8@sparrow.telecommunity.com> <5.1.1.6.0.20070501195640.02e20ca0@sparrow.telecommunity.com> <5.1.1.6.0.20070501212508.02f11d30@sparrow.telecommunity.com> <5.1.1.6.0.20070501234608.02a589f8@sparrow.telecommunity.com> Message-ID: <4638912A.1060804@ronadam.com> Phillip J. Eby wrote: > At 07:37 PM 5/1/2007 -0700, Guido van Rossum wrote: >> That's one solution. Another solution would be to use GFs in Pydoc to >> make it overloadable; I'd say pydoc could use a bit of an overhault at >> this point. > > True enough; until you mentioned that, I'd forgotten that a week or two ago > I got an email from somebody working on the pydoc overhaul who mentioned > that he had had to work up an ad-hoc generic function implementation for > just that reason. :) Ah, That would be me. :-) I'm still working on it and hope to have it done before the library reorganization. (Minus Python 3000 enhancements since it needs to work with python 2.6) The resulting (yes add-hoc) solution, is basically what I needed to do to get it to work nicely. Talon showed me an example of his that used a decorator to initialize a dispatch table. Which is much easier to maintain over manually editing it as I was doing. Here is an outline of how it generally works. Maybe you can see where proper generic functions might be useful. INTROSPECTION or (Making a DocInfo Object.) =========================================== The actual introspection consists of mostly using the inspect module or looking directly at either the attributes, files, or file system. The end product is a data structure containing tagged strings that can be parsed and formatted at a later stage. * A DocInfo object is a DocInfo-list of strings, and more DocInfo objects in fifo order, with a tag attribute and a depth first iterator method. Ultimately the contents are strings (or string like objects) that came from some input source. (Note: All or any of this can be used outside of pydoc if it is found to be generally useful.) General use: 1. Create a inspection dispatcher. select = Dispatcher() # A dispatcher/dictionary. 2. Define introspective functions, and use a decorator to add them to the dispatcher. @select.add('tag') foo(tag, name, obj): # The tag is added by the dispatcher. items = get_some_info_about_obj() title = DocInfo('title', name) body = DocInfo('list', items) return DocInfo(tag, title, body) Do the above for all unique objectes. (Functions can have have more than one tag name.) 3. Get input from the help function, interactive help, or web server request and create a DocInfo structure. get_info(request): # parse if needed. (search, topics, inexes, etc...) obj = locate(request) tag = describe(obj) # get a descripton that matches a tag. return select(tag, request, obj) Some keys are very general such as 'list', 'item', 'name', 'text', and some are specific to the source the data came from... 'package', 'class', 'module', 'function', etc... If all you want is to send the text out the way it came in... you can use simple string functions. result = DocInfo(request).format(str) That will produce one long everythingruntogether output. def repr_line(s): return repr(s) + '\n' result = DocInfo(request).present(repr_line) This will put each tagged string on it's own line with quotes around it. Since it's a nested structure the quotes will be nested too. ADVANCED FORMATTING =================== Create a DocFormatter object to format a DocInfo data structure with. 1. Define a pre_processor function. (optional) This alters the data structure. For example you can rearrange, remove, and/or replace parts before any formatting has occured. 2. Define a pre_formatter function. (optional) Pre-format input strings at the bottom (depth first) but not intermediate result strings. Any function that takes a string and returns a string is fine here. ie... cgi.escape() 3. Define a formatter to format DocInfo list objects according to the tags. (list objects are joined after sub items are formatted.) * A function with an if-elif-else structure here is perfectly fine, but a dispatcher is better more complex things. ;-) (a) Create a dispatcher object. (b) Add functions to the dispatcher by using decorators and tag names. * The tags passed to the functions by the dispatcher contains all parent tags prepended to them with '.' seperators. This allows you to format based on where something is in addition to what it is. The dispatcher class *is* the formatter function in this case. The __call__ method is used to keep it interchangeable with regular functions. So a method @dispatcher.add(tag1) is used as the decorator to add functions to the dispatcher. select = Dispatcher() @select.add('function', 'method') def format_function(tag, info): ... return formatted_info Multiple tags allow a single function to perform several roles. 4. Create a post_formatter. (optional) An example of this might be to replace place holders with dynamic content collected during the formatting process. 5. Combine the formatters into a single DocFormatter class. Example from the html formatter: formatter = DocFormatter(preformat=cgi.escape, formatter=select, postformat=page) The callable, 'select', is the dispatcher, and 'page' is a post-formatter that wraps the contents into the final html page with head, navigation bar and the body sections. The DocInfo format method iterates it's contents and calls the formatter on it. The formatter determines what action needs to be taken by what's given it and whether or not it's the first time or last time it is called. And so to put it all together... result = DocInfo(request).format(formatter) I currently have three formatters. - text/console - html - xml It should be very easy to write other formatters by using these as starting points. Cheers, Ron From ark at acm.org Wed May 2 15:37:37 2007 From: ark at acm.org (Andrew Koenig) Date: Wed, 2 May 2007 09:37:37 -0400 Subject: [Python-3000] PEP-3125 -- remove backslash continuation Message-ID: <003601c78cbf$0cacec90$2606c5b0$@org> Looking at PEP-3125, I see that one of the rejected alternatives is to allow any unfinished expression to indicate a line continuation. I would like to suggest a modification to that alternative that has worked successfully in another programming language, namely Stu Feldman's EFL. EFL is a language intended for numerical programming; it compiles into Fortran with the interesting property that the resulting Fortran code is intended to be human-readable and maintainable by people who do not happen to have access to the EFL compiler. Anyway, the (only) continuation rule in EFL is that if the last token in a line is one that lexically cannot be the last token in a statement, then the next line is considered a continuation of the current line. Python currently has a rule that if parentheses are unbalanced, a newline does not end the statement. If we were to translate the EFL rule to Python, it would be something like this: The whitespace that follows an operator or open bracket or parenthesis can include newline characters. Note that if this suggestion were implemented, it would presumably be at a very low lexical level--even before the decision is made to turn a newline followed by spaces into an INDENT or DEDENT token. I think that this property solves the difficulty-of-parsing problem. Indeed, I think that this suggestion would be easier to implement than the current unbalanced-parentheses rule. Note also that like the current backslash rule, the space after the newline would be just space, with no special significance. So to rewrite the examples from the PEP: "abc" + # Plus is an operator, so it continues "def" # The extra spaces before "def" do not constitute an INDENT "abc" # Line does not end with an operator, so statement ends + "def" # The newline and spaces constitute an INDENT -- this is a syntax error ("abc" # I have no opinion about keeping the unbalanced-parentheses rule -- + "def") # but I do think that it is harder to parse (and also harder to read) # than what I am proposing. From jimjjewett at gmail.com Wed May 2 15:58:37 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 2 May 2007 09:58:37 -0400 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <46381D4B.2010802@v.loewis.de> References: <46371BD2.7050303@v.loewis.de> <46381D4B.2010802@v.loewis.de> Message-ID: On 5/2/07, "Martin v. L?wis" wrote: > > So, Martin, I suggest that you expand your proposal to include a > > transliteration mechanism and limit the allowed characters to those which > > can be translitered. I presume that this would be an expanding set. Once > > a mechanism is in place, people who want 'their' character set included can > > do the work needed for that set. > I can certainly add that as a request, but I'm -1 on it. There shouldn't > be two different spellings for the same identifier, plus transliteration > systems often depend on the natural language (e.g. ? is transliterated > as oe in German, but (I believe) just as o in the Skandinavian languages > that have that character). I think this might be a job for the IDE, at at most an import hook. If you want the German transliteration, then use the German import hook. The reason it might need to be part of the IDE is to transliterate back; for example when there are error messages. Pity there isn't a good way to say "Stop parsing this source file; feed the rest to XXX and then what it sends back." -jJ From fuzzyman at voidspace.org.uk Wed May 2 17:42:09 2007 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 02 May 2007 16:42:09 +0100 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: References: Message-ID: <4638B151.6020901@voidspace.org.uk> Jim Jewett wrote: > PEP: 30xz > Title: Simplified Parsing > Version: $Revision$ > Last-Modified: $Date$ > Author: Jim J. Jewett > Status: Draft > Type: Standards Track > Content-Type: text/plain > Created: 29-Apr-2007 > Post-History: 29-Apr-2007 > > > Abstract > > Python initially inherited its parsing from C. While this has > been generally useful, there are some remnants which have been > less useful for python, and should be eliminated. > > + Implicit String concatenation > > + Line continuation with "\" > > + 034 as an octal number (== decimal 28). Note that this is > listed only for completeness; the decision to raise an > Exception for leading zeros has already been made in the > context of PEP XXX, about adding a binary literal. > > > Rationale for Removing Implicit String Concatenation > > Implicit String concatentation can lead to confusing, or even > silent, errors. [1] > > def f(arg1, arg2=None): pass > > f("abc" "def") # forgot the comma, no warning ... > # silently becomes f("abcdef", None) > > Implicit string concatenation is massively useful for creating long strings in a readable way though: call_something("first part\n" "second line\n" "third line\n") I find it an elegant way of building strings and would be sad to see it go. Adding trailing '+' signs is ugly. Michael Foord > or, using the scons build framework, > > sourceFiles = [ > 'foo.c', > 'bar.c', > #...many lines omitted... > 'q1000x.c'] > > It's a common mistake to leave off a comma, and then scons complains > that it can't find 'foo.cbar.c'. This is pretty bewildering behavior > even if you *are* a Python programmer, and not everyone here is. > > Note that in C, the implicit concatenation is more justified; there > is no other way to join strings without (at least) a function call. > > In Python, strings are objects which support the __add__ operator; > it is possible to write: > > "abc" + "def" > > Because these are literals, this addition can still be optimized > away by the compiler. > > Guido indicated [2] that this change should be handled by PEP, because > there were a few edge cases with other string operators, such as the %. > The resolution is to treat them the same as today. > > ("abc %s def" + "ghi" % var) # fails like today. > # raises TypeError because of > # precedence. (% before +) > > ("abc" + "def %s ghi" % var) # works like today; precedence makes > # the optimization more difficult to > # recognize, but does not change the > # semantics. > > ("abc %s def" + "ghi") % var # works like today, because of > # precedence: () before % > # CPython compiler can already > # add the literals at compile-time. > > > Rationale for Removing Explicit Line Continuation > > A terminal "\" indicates that the logical line is continued on the > following physical line (after whitespace). > > Note that a non-terminal "\" does not have this meaning, even if the > only additional characters are invisible whitespace. (Python depends > heavily on *visible* whitespace at the beginning of a line; it does > not otherwise depend on *invisible* terminal whitespace.) Adding > whitespace after a "\" will typically cause a syntax error rather > than a silent bug, but it still isn't desirable. > > The reason to keep "\" is that occasionally code looks better with > a "\" than with a () pair. > > assert True, ( > "This Paren is goofy") > > But realistically, that paren is no worse than a "\". The only > advantage of "\" is that it is slightly more familiar to users of > C-based languages. These same languages all also support line > continuation with (), so reading code will not be a problem, and > there will be one less rule to learn for people entirely new to > programming. > > > Rationale for Removing Implicit Octal Literals > > This decision should be covered by PEP ???, on numeric literals. > It is mentioned here only for completeness. > > C treats integers beginning with "0" as octal, rather than decimal. > Historically, Python has inherited this usage. This has caused > quite a few annoying bugs for people who forgot the rule, and > tried to line up their constants. > > a = 123 > b = 024 # really only 20, because octal > c = 245 > > In Python 3.0, the second line will instead raise a SyntaxError, > because of the ambiguity. Instead, the line should be written > as in one of the following ways: > > b = 24 # PEP 8 > b = 24 # columns line up, for quick scanning > b = 0t24 # really did want an Octal! > > > References > > [1] Implicit String Concatenation, Jewett, Orendorff > http://mail.python.org/pipermail/python-ideas/2007-April/000397.html > > [2] PEP 12, Sample reStructuredText PEP Template, Goodger, Warsaw > http://www.python.org/peps/pep-0012 > > [3] http://www.opencontent.org/openpub/ > > > > Copyright > > This document has been placed in the public domain. > > > > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > > From pje at telecommunity.com Wed May 2 18:00:03 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 02 May 2007 12:00:03 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <46384247.8020601@canterbury.ac.nz> References: <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070502113122.04e41350@sparrow.telecommunity.com> At 07:48 PM 5/2/2007 +1200, Greg Ewing wrote: >Is there something about generic functions that makes >them different from methods in this regard? Yes. 1. When you're dispatching on more than one argument type, you're likely to have more methods involved. 2. If you are using generic functions to implement "events", or using them AOP-style to "hook" other actions (e.g. to implement logging, persistence, transactions, undo, etc.), then you will be *mostly* doing "before" and "after" actions, with the occasional "around". (See also Jason's comment quoted below.) >>1) a lot more pleasant not to write the extra boilerplate all the time, > >I'd work on that by finding ways to reduce the boilerplate. Um... I did. They're called @before and @after. :) >There are examples, yes, but they don't come across as >very compelling as to why there should be so many variations >of the overloading decorator rather than a single general >one. I notice that you didn't respond to my point that these also make it easier for the reader to tell what the method is doing, without needing to carefully inspect the body. Meanwhile, it takes less than 40 lines of code to implement both @before and @after; if nothing else they would serve as excellent examples of how to implement other method combinations (besides the @discount example in the PEP). However, as it happens they are quite useful in and of themselves. As Jason Orendorff put it: """In short, you have to ask yourself: am I hooking something (before/after), implementing it (when), or just generally looking for trouble (around)?""" >CLOS strikes me as being the union of all Lisp dialects that >anyone has ever used, You seem to be confusing Common Lisp with CLOS. They are not the same thing. Meanwhile, AspectJ and Inform 7 also include before/after/around advice for their generic functions, so it's hardly only CLOS as an example of their usefulness. In Inform 7, the manual notes that "after" and "instead" (around) are the most commonly used; however, this is probably because every action (generic function) in the language already has three method combination phases called "check", "carry out", and "report"! So, some of the uses of "before" that would happen in other languages get handled as "check"-phase rules in Inform. (Also, they are called "instead" rules because in Inform you are usually *not* invoking the overridden action, but providing a substitute behavior.) From steven.bethard at gmail.com Wed May 2 19:00:01 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 2 May 2007 11:00:01 -0600 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <4638B151.6020901@voidspace.org.uk> References: <4638B151.6020901@voidspace.org.uk> Message-ID: On 5/2/07, Michael Foord wrote: > Implicit string concatenation is massively useful for creating long > strings in a readable way though: > > call_something("first part\n" > "second line\n" > "third line\n") > > I find it an elegant way of building strings and would be sad to see it > go. Adding trailing '+' signs is ugly. You'll still have textwrap.dedent:: call_something(dedent('''\ first part second line third line ''')) And using textwrap.dedent, you don't have to remember to add the \n at the end of every line. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From trentm at activestate.com Wed May 2 19:34:15 2007 From: trentm at activestate.com (Trent Mick) Date: Wed, 02 May 2007 10:34:15 -0700 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: References: <4638B151.6020901@voidspace.org.uk> Message-ID: <4638CB97.1040503@activestate.com> Steven Bethard wrote: > On 5/2/07, Michael Foord wrote: >> Implicit string concatenation is massively useful for creating long >> strings in a readable way though: >> >> call_something("first part\n" >> "second line\n" >> "third line\n") >> >> I find it an elegant way of building strings and would be sad to see it >> go. Adding trailing '+' signs is ugly. > > You'll still have textwrap.dedent:: > > call_something(dedent('''\ > first part > second line > third line > ''')) > > And using textwrap.dedent, you don't have to remember to add the \n at > the end of every line. But if you don't want the EOLs? Example from some code of mine: raise MakeError("extracting '%s' in '%s' did not create the " "directory that the Python build will expect: " "'%s'" % (src_pkg, dst_dir, dst)) I use this kind of thing frequently. Don't know if others consider it bad style. Trent -- Trent Mick trentm at activestate.com From guido at python.org Wed May 2 20:08:36 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 2 May 2007 11:08:36 -0700 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: [Georg] > >> >>> a, *b, c = range(5) > >> >>> a > >> 0 > >> >>> c > >> 4 > >> >>> b > >> [1, 2, 3] [Guido] > > Has it been pointed out to you already that this particular example is > > hard to implement if the RHS is an iterator whose length is not known > > a priori? The implementation would have to be quite hairy -- it would > > have to assign everything to the list b until the iterator is > > exhausted, and then pop a value from the end of the list and assign it > > to c. [Georg] > Yes, that is correct. My implementation isn't *that* hairy, though, it's > only 13 lines of code more. OK. The PEP was kind of light on substance here. Glad you've thought about it. > I'll post the patch to SourceForge later today. Cool. > > it would be much easier if *b was only allowed at the end. (It > > would be even worse if b were assigned a tuple instead of a list, as > > per your open issues.) > > The created tuple is a fresh one, so can't I just copy pointers like from a > list and set ob_size later? Sure. > > Also, what should this do? Perhaps the grammar could disallow it? > > > > *a = range(5) > > I'm not so sure about the grammar, I'm currently catching it in the AST > generation stage. Hopefully it's possible to only allow this if there's at least one comma? In any case the grammar will probably end up accepting *a in lots of places where it isn't really allowed and you'll have to fix all of those. That sounds messy; only allowing *a at the end seems a bit more manageable. But I'll hold off until I can shoot holes in your implementation. ;-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed May 2 20:15:04 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 2 May 2007 11:15:04 -0700 Subject: [Python-3000] PEP-3125 -- remove backslash continuation In-Reply-To: <003601c78cbf$0cacec90$2606c5b0$@org> References: <003601c78cbf$0cacec90$2606c5b0$@org> Message-ID: On 5/2/07, Andrew Koenig wrote: > Looking at PEP-3125, I see that one of the rejected alternatives is to allow > any unfinished expression to indicate a line continuation. > > I would like to suggest a modification to that alternative that has worked > successfully in another programming language, namely Stu Feldman's EFL. EFL > is a language intended for numerical programming; it compiles into Fortran > with the interesting property that the resulting Fortran code is intended to > be human-readable and maintainable by people who do not happen to have > access to the EFL compiler. > > Anyway, the (only) continuation rule in EFL is that if the last token in a > line is one that lexically cannot be the last token in a statement, then the > next line is considered a continuation of the current line. > > Python currently has a rule that if parentheses are unbalanced, a newline > does not end the statement. If we were to translate the EFL rule to Python, > it would be something like this: > > The whitespace that follows an operator or open bracket or > parenthesis > can include newline characters. > > Note that if this suggestion were implemented, it would presumably be at a > very low lexical level--even before the decision is made to turn a newline > followed by spaces into an INDENT or DEDENT token. I think that this > property solves the difficulty-of-parsing problem. Indeed, I think that > this suggestion would be easier to implement than the current > unbalanced-parentheses rule. > > Note also that like the current backslash rule, the space after the newline > would be just space, with no special significance. So to rewrite the > examples from the PEP: > > "abc" + # Plus is an operator, so it continues > "def" # The extra spaces before "def" do not constitute an > INDENT > > "abc" # Line does not end with an operator, so statement ends > + "def" # The newline and spaces constitute an INDENT -- this > is a syntax error > > ("abc" # I have no opinion about keeping the > unbalanced-parentheses rule -- > + "def") # but I do think that it is harder to parse (and also > harder to read) > # than what I am proposing. I am worried that (as no indent is required on the next line) it will accidentally introduce legal interpretations for certain common (?) typos, e.g. x = y+ # Used to be y+1, the 1 got dropped f(x) Still, if someone wants to give implementing this a try we could add this to the PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ark at acm.org Wed May 2 20:24:50 2007 From: ark at acm.org (Andrew Koenig) Date: Wed, 2 May 2007 14:24:50 -0400 Subject: [Python-3000] PEP-3125 -- remove backslash continuation In-Reply-To: References: <003601c78cbf$0cacec90$2606c5b0$@org> Message-ID: <001e01c78ce7$2c986f70$85c94e50$@org> > I am worried that (as no indent is required on the next line) it will > accidentally introduce legal interpretations for certain common (?) > typos, e.g. > x = y+ # Used to be y+1, the 1 got dropped > f(x) A reasonable worry. It could still be solved at the lexical level by requiring every continuation line to have more leading whitespace than the first of the lines being continued, and still not mapping that whitespace into an INDENT, but of course that approach adds complexity. All I can say is that it worked in practice in EFL, and I adopted the same approach in Snocone without any complaints. (Of course Python has lots more users than Snocone) From steven.bethard at gmail.com Wed May 2 20:45:38 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 2 May 2007 12:45:38 -0600 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <20070502181937.GF19189@seldon> References: <4638B151.6020901@voidspace.org.uk> <20070502175301.GA24510@localhost.localdomain> <20070502181937.GF19189@seldon> Message-ID: On 5/2/07, Brian Harring wrote: > Personally, I'm -1 on nuking implicit string concatenation; the > examples provided for the 'why' aren't that strong in my experience, > and the forced shift to concattenation is rather annoying when you're > dealing with code limits (80 char limit for example)- > > dprint("depends level cycle: %s: " > "dropping cycle for %s from %s" % > (cur_frame.atom, datom, > cur_frame.current_pkg), > "cycle") > FWLIW, I pretty much always write this as:: msg = "depends level cycle: %s: dropping cycle for %s from %s" tup = cur_frame.atom, datom, cur_frame.current_pkg, "cycle" dprint(msg % tup) But yes, occasionally I run into problems when the string still doesn't fit on a single line. (Of course, I usually solve that by shortening the string...) ;-) STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From pje at telecommunity.com Wed May 2 20:51:06 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 02 May 2007 14:51:06 -0400 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <4638CB97.1040503@activestate.com> References: <4638B151.6020901@voidspace.org.uk> Message-ID: <5.1.1.6.0.20070502144742.02bc1908@sparrow.telecommunity.com> At 10:34 AM 5/2/2007 -0700, Trent Mick wrote: >But if you don't want the EOLs? Example from some code of mine: > > raise MakeError("extracting '%s' in '%s' did not create the " > "directory that the Python build will expect: " > "'%s'" % (src_pkg, dst_dir, dst)) > >I use this kind of thing frequently. Don't know if others consider it >bad style. Well, I do it a lot too; don't know if that makes it good or bad, though. :) I personally don't see a lot of benefit to changing the lexical rules for Py3K, however. The hard part of lexing Python is INDENT/DEDENT (and the associated unbalanced parens rule), and none of these proposals suggest removing *that*. Overall, this whole thing seems like a bikeshed to me. From fdrake at acm.org Wed May 2 20:57:38 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 2 May 2007 14:57:38 -0400 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <4638CB97.1040503@activestate.com> References: <4638CB97.1040503@activestate.com> Message-ID: <200705021457.38344.fdrake@acm.org> On Wednesday 02 May 2007, Trent Mick wrote: > raise MakeError("extracting '%s' in '%s' did not create the " > "directory that the Python build will expect: " > "'%s'" % (src_pkg, dst_dir, dst)) > > I use this kind of thing frequently. Don't know if others consider it > bad style. I do this too; this is a good way to have a simple human-readable message without doing weird things to about extraneous newlines or strange indentation. -1 on removing implicit string catenation. -Fred -- Fred L. Drake, Jr. From mike.klaas at gmail.com Wed May 2 21:08:21 2007 From: mike.klaas at gmail.com (Mike Klaas) Date: Wed, 2 May 2007 12:08:21 -0700 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <4637F606.4060707@canterbury.ac.nz> References: <46371BD2.7050303@v.loewis.de> <4637631A.6030702@v.loewis.de> <4637F606.4060707@canterbury.ac.nz> Message-ID: <3d2ce8cb0705021208g38d88020t6b183a3061191766@mail.gmail.com> On 5/1/07, Greg Ewing wrote: > Martin v. L?wis wrote: > > > http://mail.python.org/pipermail/python-3000/2006-April/001526.html > > > > where Guido states that he trusts me that it can be made to work, > > and that "eventually" it needs to be supported. +0 > He says "the tools aren't ready yet", which I take to > mean that Python won't need to support it until all > widely-used editors, email and news software, etc, etc, > reliably support displaying and editing of all > unicode characters. We're clearly a long way from > that situation. Couldn't the same argument be applied against non-ascii characters in string literals? It would be safest to enforce the use of \u escapes, no? It is certainly true that the use of non-ascii indentifiers will cause the code to be unusable by _someone_ using _some_ set of tools. This is something that the user will be aware of (as all new users of non-ascii have been in the past). Tools won't change without bug reports. -Mike From snaury at gmail.com Wed May 2 21:23:47 2007 From: snaury at gmail.com (Alexey Borzenkov) Date: Wed, 2 May 2007 23:23:47 +0400 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: References: Message-ID: On 4/30/07, Jim Jewett wrote: > Python initially inherited its parsing from C. While this has > been generally useful, there are some remnants which have been > less useful for python, and should be eliminated. > > + Implicit String concatenation > > + Line continuation with "\" I don't know if I can vote, but if I could I'd be -1 on this. Can't say I'm using continuation often, but there's one case when I'm using it and I'd like to continue using it: #!/usr/bin/env python """\ Usage: some-tool.py [arguments...] Does this and that based on its arguments""" if condition: print __doc__ sys.exit(1) This way usage immediately stands out much better, without any unnecessary new lines. Best regards, Alexey. From barry at python.org Wed May 2 21:40:33 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 2 May 2007 15:40:33 -0400 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <5.1.1.6.0.20070502144742.02bc1908@sparrow.telecommunity.com> References: <4638B151.6020901@voidspace.org.uk> <5.1.1.6.0.20070502144742.02bc1908@sparrow.telecommunity.com> Message-ID: <179D5383-88F0-4246-B355-5A817B9F7EBE@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 2, 2007, at 2:51 PM, Phillip J. Eby wrote: > At 10:34 AM 5/2/2007 -0700, Trent Mick wrote: >> But if you don't want the EOLs? Example from some code of mine: >> >> raise MakeError("extracting '%s' in '%s' did not create the " >> "directory that the Python build will expect: " >> "'%s'" % (src_pkg, dst_dir, dst)) >> >> I use this kind of thing frequently. Don't know if others consider it >> bad style. > > Well, I do it a lot too; don't know if that makes it good or bad, > though. :) I just realized that changing these lexical rules might have an adverse affect on internationalization. Or it might force more lines to go over the 79 character limit. The problem is that _("some string" " and more of it") is not the same as _("some string" + " and more of it") because the latter won't be extracted by tools like pygettext (I'm not sure about standard gettext). You would either have to teach pygettext and maybe gettext about this construct, or you'd have to use something different. Triple quoted strings are probably not so good because you'd have to still backslash the trailing newlines. You can't split the strings up into sentence fragments because that makes some translations impossible. Someone ease my worries here. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjjpOHEjvBPtnXfVAQJ/xwP7BNMGvrmuxKmb7QiIawYjORKt9Pxmz7XJ kFVHl47UusOGzgmtwm6Qi2DeSDsG0JOu0XwlZbX3YPE8omTzTP8WLdavJ1e+i2nP V8GwXVyFgyFHx3V1jb0o9eiUGFEwkXInCGcOFqdWOEF49TtRNHGY6ne+eumwkqxK qOyTGkcreG4= =J6I/ -----END PGP SIGNATURE----- From barry at python.org Wed May 2 21:41:38 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 2 May 2007 15:41:38 -0400 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 2, 2007, at 3:23 PM, Alexey Borzenkov wrote: > On 4/30/07, Jim Jewett wrote: >> Python initially inherited its parsing from C. While this has >> been generally useful, there are some remnants which have been >> less useful for python, and should be eliminated. >> >> + Implicit String concatenation >> >> + Line continuation with "\" > > I don't know if I can vote, but if I could I'd be -1 on this. Can't > say I'm using continuation often, but there's one case when I'm using > it and I'd like to continue using it: > > #!/usr/bin/env python > """\ > Usage: some-tool.py [arguments...] > > Does this and that based on its arguments""" > > if condition: > print __doc__ > sys.exit(1) > > This way usage immediately stands out much better, without any > unnecessary new lines. Me too, all the time. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjjpcnEjvBPtnXfVAQL0ngP9FwE7swQSdPiH4wAMQRe1CAzWXBLCXKok d08GHhyp5GWHs1UzDZbnxnLRVZt+ra/3iSJT8g32X2gX9gWkFUJfqZFN9wLVjzDZ qlX4m2cJs4nlskRDsycPMY9MLGUwQ8bt7mn92Oh3vXAvtXm42Dxu66NvTlyYdIFQ 9M2HrMbBn1M= =3kNg -----END PGP SIGNATURE----- From tim.peters at gmail.com Wed May 2 21:46:05 2007 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 2 May 2007 15:46:05 -0400 Subject: [Python-3000] PEP-3125 -- remove backslash continuation In-Reply-To: References: <003601c78cbf$0cacec90$2606c5b0$@org> Message-ID: <1f7befae0705021246w3446183du10c178067b3d3d7d@mail.gmail.com> ... [Guido] > I am worried that (as no indent is required on the next line) it will > accidentally introduce legal interpretations for certain common (?) > typos, e.g. > > x = y+ # Used to be y+1, the 1 got dropped > f(x) The Icon language also uses this rule, and I never experienced problems with it there. OTOH, the "open bracket" rule is certainly sufficient by itself, and is invaluable for writing "big" list, tuple, and dict literals (things I doubt come up in Andrew's EFL inspiration). From rasky at develer.com Wed May 2 21:50:52 2007 From: rasky at develer.com (Giovanni Bajo) Date: Wed, 02 May 2007 21:50:52 +0200 Subject: [Python-3000] PEP-3125 -- remove backslash continuation In-Reply-To: References: <003601c78cbf$0cacec90$2606c5b0$@org> Message-ID: On 02/05/2007 20.15, Guido van Rossum wrote: > I am worried that (as no indent is required on the next line) it will > accidentally introduce legal interpretations for certain common (?) > typos, e.g. > > x = y+ # Used to be y+1, the 1 got dropped > f(x) It would also change the meaning of existing valid programs such as: x = 1, y() The additional ident would solve this of course, but as you already said it's a bad idea from an implementation standpoint. -- Giovanni Bajo From ark-mlist at att.net Wed May 2 22:03:15 2007 From: ark-mlist at att.net (Andrew Koenig) Date: Wed, 2 May 2007 16:03:15 -0400 Subject: [Python-3000] PEP-3125 -- remove backslash continuation In-Reply-To: <1f7befae0705021246w3446183du10c178067b3d3d7d@mail.gmail.com> References: <003601c78cbf$0cacec90$2606c5b0$@org> <1f7befae0705021246w3446183du10c178067b3d3d7d@mail.gmail.com> Message-ID: <003a01c78cf4$ebe54ad0$c3afe070$@net> > OTOH, the "open bracket" rule is certainly sufficient by itself, and > is invaluable for writing "big" list, tuple, and dict literals (things > I doubt come up in Andrew's EFL inspiration). If comma is treated as an operator, the "open bracket" rule doesn't seem all that invaluable to me. Can you give me an example? From ark-mlist at att.net Wed May 2 22:04:46 2007 From: ark-mlist at att.net (Andrew Koenig) Date: Wed, 2 May 2007 16:04:46 -0400 Subject: [Python-3000] PEP-3125 -- remove backslash continuation In-Reply-To: References: <003601c78cbf$0cacec90$2606c5b0$@org> Message-ID: <003b01c78cf5$2227ac50$667704f0$@net> > It would also change the meaning of existing valid programs such as: > > x = 1, > y() This is the strongest argument against the idea that I've seen so far. It could be solved by *not* treating , as an operator, and by keeping the open bracket rule. From guido at python.org Wed May 2 22:17:39 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 2 May 2007 13:17:39 -0700 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: References: Message-ID: On 5/2/07, Alexey Borzenkov wrote: > I don't know if I can vote, but if I could I'd be -1 on this. Can't > say I'm using continuation often, but there's one case when I'm using > it and I'd like to continue using it: > > #!/usr/bin/env python > """\ > Usage: some-tool.py [arguments...] > > Does this and that based on its arguments""" I've been trying to tease out of the PEP author whether \ inside string literals would also be dropped. I'd be against that even if \ outside strings were to be killed. So a vote based solely on this argument has little value. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.peters at gmail.com Wed May 2 22:30:16 2007 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 2 May 2007 16:30:16 -0400 Subject: [Python-3000] PEP-3125 -- remove backslash continuation In-Reply-To: <003a01c78cf4$ebe54ad0$c3afe070$@net> References: <003601c78cbf$0cacec90$2606c5b0$@org> <1f7befae0705021246w3446183du10c178067b3d3d7d@mail.gmail.com> <003a01c78cf4$ebe54ad0$c3afe070$@net> Message-ID: <1f7befae0705021330y755e20fcrfbaf8bfd94151ded@mail.gmail.com> [Tim Peters] >> ... >> OTOH, the "open bracket" rule is certainly sufficient by itself, and >> is invaluable for writing "big" list, tuple, and dict literals (things >> I doubt come up in Andrew's EFL inspiration). [Andrew Koenig] > If comma is treated as an operator, the "open bracket" rule doesn't seem all > that invaluable to me. Can you give me an example? Treating comma as an infix operator would clash in weird ways with the current "sometimes" treatment of comma as denoting a tuple literal ... and I see that Giovanni Bajo already posted an example while I was typing this :-) Icon doesn't have this problem, and I'm guessing that EFL doesn't either. Incidentally, I know one Python programmer who writes list literals like this: mylist = [ 1 , 2 , 3 ] In a fixed-width font, the commas and brackets are all in the same column. While "bleech" is the proper reaction ;-), that does work fine today. Historical note: the open bracket rule was introduced in Python 0.9.9 (29 Jul 1993). Before that, backslash continuation was the only way to split a statement across lines. If the open bracket rule had been there from the start, I doubt backslash continuation would have been there at all (except in string literals). From ark-mlist at att.net Wed May 2 23:09:18 2007 From: ark-mlist at att.net (Andrew Koenig) Date: Wed, 2 May 2007 17:09:18 -0400 Subject: [Python-3000] PEP-3125 -- remove backslash continuation In-Reply-To: <1f7befae0705021330y755e20fcrfbaf8bfd94151ded@mail.gmail.com> References: <003601c78cbf$0cacec90$2606c5b0$@org> <1f7befae0705021246w3446183du10c178067b3d3d7d@mail.gmail.com> <003a01c78cf4$ebe54ad0$c3afe070$@net> <1f7befae0705021330y755e20fcrfbaf8bfd94151ded@mail.gmail.com> Message-ID: <004501c78cfe$264c39a0$72e4ace0$@net> > Incidentally, I know one Python programmer who writes list literals > like this: > > mylist = [ > 1 > , 2 > , 3 > ] > > In a fixed-width font, the commas and brackets are all in the same > column. While "bleech" is the proper reaction ;-), that does work > fine today. Sounds like another argument in favor of not treating comma as an operator (because it *can* end a statement) but keeping the open-bracket rule. From g.brandl at gmx.net Wed May 2 23:31:40 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 02 May 2007 23:31:40 +0200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: Guido van Rossum schrieb: >> > Also, what should this do? Perhaps the grammar could disallow it? >> > >> > *a = range(5) >> >> I'm not so sure about the grammar, I'm currently catching it in the AST >> generation stage. > > Hopefully it's possible to only allow this if there's at least one comma? That's easy. But now that I have lightened the grammar changes a bit, catching the no-comma case has gotten a bit hairy, as you'll see in the patch. > In any case the grammar will probably end up accepting *a in lots of > places where it isn't really allowed and you'll have to fix all of > those. In fact it's not too hard: only store context is allowed. > That sounds messy; only allowing *a at the end seems a bit more > manageable. But I'll hold off until I can shoot holes in your > implementation. ;-) The patch is at http://python.org/sf/1711529. Have fun :) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From skip at pobox.com Wed May 2 23:23:00 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 2 May 2007 16:23:00 -0500 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <4638CB97.1040503@activestate.com> References: <4638B151.6020901@voidspace.org.uk> <4638CB97.1040503@activestate.com> Message-ID: <17977.308.192435.48545@montanaro.dyndns.org> Trent> But if you don't want the EOLs? Example from some code of mine: Trent> raise MakeError("extracting '%s' in '%s' did not create the " Trent> "directory that the Python build will expect: " Trent> "'%s'" % (src_pkg, dst_dir, dst)) Trent> I use this kind of thing frequently. Don't know if others Trent> consider it bad style. I use it all the time. For example, to build up (what I consider to be) readable SQL queries: rows = self.executesql("select cities.city, state, country" " from cities, venues, events, addresses" " where cities.city like %s" " and events.active = 1" " and venues.address = addresses.id" " and addresses.city = cities.id" " and events.venue = venues.id", (city,)) I would be disappointed it string literal concatention went away. Skip From snaury at gmail.com Wed May 2 23:25:00 2007 From: snaury at gmail.com (Alexey Borzenkov) Date: Thu, 3 May 2007 01:25:00 +0400 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: References: Message-ID: On 5/3/07, Guido van Rossum wrote: > On 5/2/07, Alexey Borzenkov wrote: > > I don't know if I can vote, but if I could I'd be -1 on this. Can't > > say I'm using continuation often, but there's one case when I'm using > > it and I'd like to continue using it: > > > > #!/usr/bin/env python > > """\ > > Usage: some-tool.py [arguments...] > > > > Does this and that based on its arguments""" > I've been trying to tease out of the PEP author whether \ inside > string literals would also be dropped. I'd be against that even if \ > outside strings were to be killed. So a vote based solely on this > argument has little value. Ouch, I didn't even think it could be dropped one way and not the other. To be honest, I don't have opinion on the usage of \ outside of string literals, never needed to use it, and it seems there's always a workaround with parenthesis. So I'm sorry, that have been very premature. -- Alexey From rhamph at gmail.com Thu May 3 00:48:30 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 2 May 2007 16:48:30 -0600 Subject: [Python-3000] Some canonical use-cases for ABCs/Interfaces/Generics In-Reply-To: <4638013D.8090902@acm.org> References: <4638013D.8090902@acm.org> Message-ID: On 5/1/07, Talin wrote: > One of my concerns in the ABC/interface discussion so far is that a lot > of the use cases presented are "toy" examples. This makes perfect sense > considering that you don't want to have to spend several pages > explaining the use case. But at the same time, it means that we might be > solving problems that aren't real, while ignoring problems that are. > > What I'd like to do is collect a set of "real-world" use cases and > document them. The idea would be that we could refer to these use cases > during the discussion, using a common terminology and shorthand examples. > > I'll present one very broad use case here, and I'd be interested if > people have ideas for other use cases. The goal is to define a small > number of broadly-defined cases that provide a broad coverage of the > problem space. The only use case I commonly experience is that of __init__. For instance: class MyClass: def __init__(self, x): if isinstance(x, basestring): self.stream = open(x) else: self.stream = x It can be called with either a path or a file-like object. It might be possible to replace the LBYL with EAFP, passing x into open() and catching errors, but it's not obvious what if any exception would be raised; the try/except is too broad. Many builtin types (int, float, etc) are conceptually similar, but with an acceptable way to narrow down the try/except: check for an x.__int__ method, without calling it. It's not obvious to me how to best do this dispatching in the long term. If you hardcode the references to basestring then, if open() starts accepting path objects that don't derive from str, all the existing code will break. Generic functions provide a cleaner way to override them if you don't want to modify the original code, but you'll still have to write *some* update for every piece of code that does a check. The only way to avoid all the updates is to create some sort of Pathish ABC to check for, but that assumes you know you'll switch to non-str-derived path objects when you define Pathish. For those keeping score, that means that duck typing will *always* fail at these problems. You have to hardcode some way of discriminating between types, and you'll have to rewrite all that hardcoding if the assumptions you made no longer hold. The goal then is to pick assumptions that will live for the longest period of time and require the least effort to change (and avoid making them unless you *need* them, ie stick to duck typing if it works). -- Adam Olsen, aka Rhamphoryncus From mhammond at skippinet.com.au Thu May 3 01:59:35 2007 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 3 May 2007 09:59:35 +1000 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: Message-ID: <02b401c78d15$f04b6110$090a0a0a@enfoldsystems.local> Please add my -1 to the chorus here, for the same reasons already expressed. Cheers, Mark > -----Original Message----- > From: python-dev-bounces+mhammond=keypoint.com.au at python.org > [mailto:python-dev-bounces+mhammond=keypoint.com.au at python.org > ]On Behalf > Of Jim Jewett > Sent: Monday, 30 April 2007 1:29 PM > To: Python 3000; Python Dev > Subject: [Python-Dev] PEP 30XZ: Simplified Parsing > > > PEP: 30xz > Title: Simplified Parsing > Version: $Revision$ > Last-Modified: $Date$ > Author: Jim J. Jewett > Status: Draft > Type: Standards Track > Content-Type: text/plain > Created: 29-Apr-2007 > Post-History: 29-Apr-2007 > > > Abstract > > Python initially inherited its parsing from C. While this has > been generally useful, there are some remnants which have been > less useful for python, and should be eliminated. > > + Implicit String concatenation > > + Line continuation with "\" > > + 034 as an octal number (== decimal 28). Note that this is > listed only for completeness; the decision to raise an > Exception for leading zeros has already been made in the > context of PEP XXX, about adding a binary literal. > > > Rationale for Removing Implicit String Concatenation > > Implicit String concatentation can lead to confusing, or even > silent, errors. [1] > > def f(arg1, arg2=None): pass > > f("abc" "def") # forgot the comma, no warning ... > # silently becomes f("abcdef", None) > > or, using the scons build framework, > > sourceFiles = [ > 'foo.c', > 'bar.c', > #...many lines omitted... > 'q1000x.c'] > > It's a common mistake to leave off a comma, and then > scons complains > that it can't find 'foo.cbar.c'. This is pretty > bewildering behavior > even if you *are* a Python programmer, and not everyone here is. > > Note that in C, the implicit concatenation is more > justified; there > is no other way to join strings without (at least) a > function call. > > In Python, strings are objects which support the __add__ operator; > it is possible to write: > > "abc" + "def" > > Because these are literals, this addition can still be optimized > away by the compiler. > > Guido indicated [2] that this change should be handled by > PEP, because > there were a few edge cases with other string operators, > such as the %. > The resolution is to treat them the same as today. > > ("abc %s def" + "ghi" % var) # fails like today. > # raises TypeError because of > # precedence. (% before +) > > ("abc" + "def %s ghi" % var) # works like today; > precedence makes > # the optimization more > difficult to > # recognize, but does > not change the > # semantics. > > ("abc %s def" + "ghi") % var # works like today, because of > # precedence: () before % > # CPython compiler can already > # add the literals at > compile-time. > > > Rationale for Removing Explicit Line Continuation > > A terminal "\" indicates that the logical line is continued on the > following physical line (after whitespace). > > Note that a non-terminal "\" does not have this meaning, > even if the > only additional characters are invisible whitespace. > (Python depends > heavily on *visible* whitespace at the beginning of a > line; it does > not otherwise depend on *invisible* terminal whitespace.) Adding > whitespace after a "\" will typically cause a syntax error rather > than a silent bug, but it still isn't desirable. > > The reason to keep "\" is that occasionally code looks better with > a "\" than with a () pair. > > assert True, ( > "This Paren is goofy") > > But realistically, that paren is no worse than a "\". The only > advantage of "\" is that it is slightly more familiar to users of > C-based languages. These same languages all also support line > continuation with (), so reading code will not be a problem, and > there will be one less rule to learn for people entirely new to > programming. > > > Rationale for Removing Implicit Octal Literals > > This decision should be covered by PEP ???, on numeric literals. > It is mentioned here only for completeness. > > C treats integers beginning with "0" as octal, rather > than decimal. > Historically, Python has inherited this usage. This has caused > quite a few annoying bugs for people who forgot the rule, and > tried to line up their constants. > > a = 123 > b = 024 # really only 20, because octal > c = 245 > > In Python 3.0, the second line will instead raise a SyntaxError, > because of the ambiguity. Instead, the line should be written > as in one of the following ways: > > b = 24 # PEP 8 > b = 24 # columns line up, for quick scanning > b = 0t24 # really did want an Octal! > > > References > > [1] Implicit String Concatenation, Jewett, Orendorff > http://mail.python.org/pipermail/python-ideas/2007-April/000397.html [2] PEP 12, Sample reStructuredText PEP Template, Goodger, Warsaw http://www.python.org/peps/pep-0012 [3] http://www.opencontent.org/openpub/ Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mhammond%40keypoint.com.au From greg.ewing at canterbury.ac.nz Thu May 3 02:38:46 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2007 12:38:46 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070502113122.04e41350@sparrow.telecommunity.com> References: <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> <5.1.1.6.0.20070502113122.04e41350@sparrow.telecommunity.com> Message-ID: <46392F16.7080707@canterbury.ac.nz> Phillip J. Eby wrote: > At 07:48 PM 5/2/2007 +1200, Greg Ewing wrote: > > > I'd work on that by finding ways to reduce the boilerplate. > > Um... I did. They're called @before and @after. :) I was talking about the need to put extra magic names in the parameter list just to be able to call the next method. > I notice that you didn't respond to my point that these also make it > easier for the reader to tell what the method is doing, No, it doesn't. It tells you a very small amount about *how* the method does whatever it does. To find out *what* the method does, you have to either read the comment/docstring, or if it doesn't have one, read the method body anyway. If you read the body, you'll notice whether and when it calls the next method. In other words, I see the calling of the next method as an implementation detail that doesn't need to be announced prominently at the top of the method. > Meanwhile, it takes less than 40 lines of code to implement both @before > and @after; Size of implementation isn't the issue, it's the mental load on someone trying to learn all this stuff and keep it in their head. It's a lot easier to learn and retain knowledge about one general mechanism than five or more special-case variations of it. > """In short, you have to ask yourself: am I hooking something > (before/after), implementing it (when), or just generally looking for > trouble (around)?""" There are a lot of other things you have to ask yourself before writing your method, too. I don't see this particular question as fundamental enough to pick out for special treatment. > You seem to be confusing Common Lisp with CLOS. They are not the same thing. You're right, my comment was really about Common Lisp as a whole. But if they can't even keep the basic Lisp dialect clean and coherent, it doesn't give me confidence that they've made any attempt to do so with its object system. More generally, arguments of the form "Language X does it this way, so it must be good" don't impress me if I don't regard language X as being particularly well designed in the first place. > Meanwhile, AspectJ and Inform 7 also include before/after/around advice > for their generic functions, so it's hardly only CLOS as an example of > their usefulness. I'm very skeptical about the whole business of aspects, too, and I find Inform 7 to be massively confusing in many ways. So you're not going to impress me by appealing to those, either. :-) -- Greg From python-dev at zesty.ca Thu May 3 02:26:36 2007 From: python-dev at zesty.ca (Ka-Ping Yee) Date: Wed, 2 May 2007 19:26:36 -0500 (CDT) Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: References: Message-ID: I fully support the removal of implicit string concatenation (explicit is better than implicit; there's only one way to do it). I also fully support the removal of backslashes for line continuation of statements (same reasons). (I mean this as distinct from line continuation within a string; that's a separate issue.) -- ?!ng From greg.ewing at canterbury.ac.nz Thu May 3 02:49:14 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2007 12:49:14 +1200 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <4638CB97.1040503@activestate.com> References: <4638B151.6020901@voidspace.org.uk> <4638CB97.1040503@activestate.com> Message-ID: <4639318A.8030206@canterbury.ac.nz> Trent Mick wrote: > But if you don't want the EOLs? Example from some code of mine: > > raise MakeError("extracting '%s' in '%s' did not create the " > "directory that the Python build will expect: " > "'%s'" % (src_pkg, dst_dir, dst)) > > I use this kind of thing frequently. Don't know if others consider it > bad style. I use it too, and would be disappointed if it were taken away. I find the usefulness considerably outweighs any occasional problems. -- Greg From greg.ewing at canterbury.ac.nz Thu May 3 02:52:21 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2007 12:52:21 +1200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: <46393245.5080203@canterbury.ac.nz> Guido van Rossum wrote: > In any case the grammar will probably end up accepting *a in lots of > places where it isn't really allowed and you'll have to fix all of > those. That sounds messy; only allowing *a at the end seems a bit more > manageable. I also would be quite happy if it were only allowed at the end, and not allowed on its own. I don't see any utility in being able to write *a = b instead of a = list(b) or some such. -- Greg From guido at python.org Thu May 3 02:55:49 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 2 May 2007 17:55:49 -0700 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <4639318A.8030206@canterbury.ac.nz> References: <4638B151.6020901@voidspace.org.uk> <4638CB97.1040503@activestate.com> <4639318A.8030206@canterbury.ac.nz> Message-ID: I think it looks like not enough people are ready for both these changes (PEP 3125 and PEP 3126). Maybe we could start by discouraging these in the style guide (PEP 8) instead? --Guido On 5/2/07, Greg Ewing wrote: > Trent Mick wrote: > > > But if you don't want the EOLs? Example from some code of mine: > > > > raise MakeError("extracting '%s' in '%s' did not create the " > > "directory that the Python build will expect: " > > "'%s'" % (src_pkg, dst_dir, dst)) > > > > I use this kind of thing frequently. Don't know if others consider it > > bad style. > > I use it too, and would be disappointed if it were > taken away. I find the usefulness considerably > outweighs any occasional problems. > > -- > Greg > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Thu May 3 03:03:39 2007 From: python at rcn.com (Raymond Hettinger) Date: Wed, 2 May 2007 21:03:39 -0400 (EDT) Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing Message-ID: <20070502210339.BHU28881@ms09.lnh.mail.rcn.net> [Skip] > I use it all the time. For example, to build up (what I consider to be) >readable SQL queries: > > rows = self.executesql("select cities.city, state, country" > " from cities, venues, events, addresses" > " where cities.city like %s" > " and events.active = 1" > " and venues.address = addresses.id" > " and addresses.city = cities.id" > " and events.venue = venues.id", > (city,)) I find that style hard to maintain. What is the advantage over multi-line strings? rows = self.executesql(''' select cities.city, state, country from cities, venues, events, addresses where cities.city like %s and events.active = 1 and venues.address = addresses.id and addresses.city = cities.id and events.venue = venues.id ''', (city,)) Raymond From guido at python.org Thu May 3 03:29:53 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 2 May 2007 18:29:53 -0700 Subject: [Python-3000] [Python-ideas] PEP 30xx: Access to Module/Class/Function Currently Being Defined (this) In-Reply-To: References: Message-ID: Summary for the impatient: -1; the PEP is insufficiently motivated and poorly specified. > PEP: 3130 > Title: Access to Current Module/Class/Function > Version: $Revision: 55056 $ > Last-Modified: $Date: 2007-05-01 12:35:45 -0700 (Tue, 01 May 2007) $ > Author: Jim J. Jewett > Status: Draft > Type: Standards Track > Content-Type: text/plain > Created: 22-Apr-2007 > Python-Version: 3.0 > Post-History: 22-Apr-2007 > > > Abstract > > It is common to need a reference to the current module, class, > or function, but there is currently no entirely correct way to > do this. This PEP proposes adding the keywords __module__, > __class__, and __function__. > > > Rationale for __module__ > > Many modules export various functions, classes, and other objects, > but will perform additional activities (such as running unit > tests) when run as a script. The current idiom is to test whether > the module's name has been set to magic value. > > if __name__ == "__main__": ... > > More complicated introspection requires a module to (attempt to) > import itself. If importing the expected name actually produces > a different module, there is no good workaround. > > # __import__ lets you use a variable, but... it gets more > # complicated if the module is in a package. > __import__(__name__) > > # So just go to sys modules... and hope that the module wasn't > # hidden/removed (perhaps for security), that __name__ wasn't > # changed, and definitely hope that no other module with the > # same name is now available. > class X(object): > pass > > import sys > mod = sys.modules[__name__] > mod = sys.modules[X.__class__.__module__] You're making this way too complicated. sys.modules[__name__] always works. > Proposal: Add a __module__ keyword which refers to the module > currently being defined (executed). (But see open issues.) > > # XXX sys.main is still changing as draft progresses. May > # really need sys.modules[sys.main] > if __module__ is sys.main: # assumes PEP (3122), Cannon > ... PEP 3122 is already rejected. > Rationale for __class__ > > Class methods are passed the current instance; from this they can "current instance" is confusing when talking about class method. I'll assume you mean "class". > determine self.__class__ (or cls, for class methods). > Unfortunately, this reference is to the object's actual class, Why unforunately? All the semantics around self.__class__ and the cls argument are geared towards the instance's class, not the lexically current class. > which may be a subclass of the defining class. The current > workaround is to repeat the name of the class, and assume that the > name will not be rebound. > > class C(B): > > def meth(self): > super(C, self).meth() # Hope C is never rebound. > > class D(C): > > def meth(self): > # ?!? issubclass(D,C), so it "works": > super(C, self).meth() > > Proposal: Add a __class__ keyword which refers to the class > currently being defined (executed). (But see open issues.) > > class C(B): > def meth(self): > super(__class__, self).meth() > > Note that super calls may be further simplified by the "New Super" > PEP (Spealman). The __class__ (or __this_class__) attribute came > up in attempts to simplify the explanation and/or implementation > of that PEP, but was separated out as an independent decision. > > Note that __class__ (or __this_class__) is not quite the same as > the __thisclass__ property on bound super objects. The existing > super.__thisclass__ property refers to the class from which the > Method Resolution Order search begins. In the above class D, it > would refer to (the current reference of name) C. Do you have any other use cases? Because Tim Delaney's 'super' implementation doesn't need this. I also note that the name __class__ is a bit confusing because it means "the object's class" in other contexts. > Rationale for __function__ > > Functions (including methods) often want access to themselves, > usually for a private storage location or true recursion. While > there are several workarounds, all have their drawbacks. Often? Private storage can just as well be placed in the class or module. The recursion use case just doesn't occur as a problem in reality (hasn't since we introduced properly nested namespaces in 2.1). > def counter(_total=[0]): > # _total shouldn't really appear in the > # signature at all; the list wrapping and > # [0] unwrapping obscure the code > _total[0] += 1 > return _total[0] > > @annotate(total=0) It makes no sense to put dangling references like this in motivating examples. Without the definion of @annotate the example is meaningless. > def counter(): > # Assume name counter is never rebound: Why do you care so much about this? It's a vanishingly rare situation in my experience. > counter.total += 1 > return counter.total You're abusing function attributes here IMO. Function attributes are *metadata* about the function; they should not be used as per-function global storage. > # class exists only to provide storage: If you don't need a class, use a module global. That's what they're for. Name it with a leading underscore to flag the fact that it's an implementation detail. > class _wrap(object): > > __total = 0 > > def f(self): > self.__total += 1 > return self.__total > > # set module attribute to a bound method: > accum = _wrap().f > > # This function calls "factorial", which should be itself -- > # but the same programming styles that use heavy recursion > # often have a greater willingness to rebind function names. > def factorial(n): > return (n * factorial(n-1) if n else 1) > > Proposal: Add a __function__ keyword which refers to the function > (or method) currently being defined (executed). (But see open > issues.) > > @annotate(total=0) > def counter(): > # Always refers to this function obj: > __function__.total += 1 > return __function__.total > > def factorial(n): > return (n * __function__(n-1) if n else 1) > > > Backwards Compatibility > > While a user could be using these names already, double-underscore > names ( __anything__ ) are explicitly reserved to the interpreter. > It is therefore acceptable to introduce special meaning to these > names within a single feature release. > > > Implementation > > Ideally, these names would be keywords treated specially by the > bytecode compiler. That is a completely insufficient attempt at describing the semantics. > Guido has suggested [1] using a cell variable filled in by the > metaclass. > > Michele Simionato has provided a prototype using bytecode hacks > [2]. This does not require any new bytecode operators; it just > modifies the which specific sequence of existing operators gets > run. Sorry, bytecode hacks don't count as a semantic specification. > Open Issues > > - Are __module__, __class__, and __function__ the right names? In > particular, should the names include the word "this", either as > __this_module__, __this_class__, and __this_function__, (format > discussed on the python-3000 and python-ideas lists) or as > __thismodule__, __thisclass__, and __thisfunction__ (inspired > by, but conflicting with, current usage of super.__thisclass__). > > - Are all three keywords needed, or should this enhancement be > limited to a subset of the objects? Should methods be treated > separately from other functions? What do __class__ and __function__ refer to inside a nested class or function? > References > > [1] Fixing super anyone? Guido van Rossum > http://mail.python.org/pipermail/python-3000/2007-April/006671.html > > [2] Descriptor/Decorator challenge, Michele Simionato > http://groups.google.com/group/comp.lang.python/browse_frm/thread/a6010c7494871bb1/62a2da68961caeb6?lnk=gst&q=simionato+challenge&rnum=1&hl=en#62a2da68961caeb6 > > > Copyright > > This document has been placed in the public domain. > > > > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: --- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Thu May 3 03:32:23 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 02 May 2007 21:32:23 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <46392F16.7080707@canterbury.ac.nz> References: <5.1.1.6.0.20070502113122.04e41350@sparrow.telecommunity.com> <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> <5.1.1.6.0.20070502113122.04e41350@sparrow.telecommunity.com> Message-ID: <5.1.1.6.0.20070502211440.02a386d8@sparrow.telecommunity.com> At 12:38 PM 5/3/2007 +1200, Greg Ewing wrote: >In other words, I see the calling of the next method >as an implementation detail that doesn't need to be >announced prominently at the top of the method. It's not an implementation detail - it's an expression of *intent*. E.g., in English, "After you start a transaction on a database, make sure you turn its logging up all the way." Please explain how you would improve one the clarity of that sentence *without* using the word "after" or any synonyms thereof. ISTM that your argument is like saying there's no need for C with all its fancy function parameter names; after all, if you read the assembly code you can see right away which registers are being used for what. That may be true, but I'd rather not have to. Meanwhile, in the case of before/after methods, not having to call the next method or return its return value means there's less code to possibly get wrong in the process. >So you're not going to impress me by appealing >to those, either. :-) I wasn't under the illusion that impressing you was possible, actually. :) From tjreedy at udel.edu Thu May 3 03:35:45 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 2 May 2007 21:35:45 -0400 Subject: [Python-3000] PEP3099 += 'Assignment will not become an operation' Message-ID: and hence '=' will not become an operator and hence '=' will not become overloadable. (unless, of course, Guido has revised previous rejections). Came up again today on c.l.p. Surprised not alread in PEP. tjr From rrr at ronadam.com Thu May 3 08:05:38 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 03 May 2007 01:05:38 -0500 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: References: <02b401c78d15$f04b6110$090a0a0a@enfoldsystems.local> Message-ID: <46397BB2.4060404@ronadam.com> Georg Brandl wrote: > FWIW, I'm -1 on both proposals too. I like implicit string literal concatenation > and I really can't see what we gain from backslash continuation removal. > > Georg -1 on removing them also. I find they are helpful. It could be made optional in block headers that end with a ':'. It's optional, (just more white space), in parenthesized expressions, tuples, lists, and dictionary literals already. >>> [1,\ ... 2,\ ... 3] [1, 2, 3] >>> (1,\ ... 2,\ ... 3) (1, 2, 3) >>> {1:'a',\ ... 2:'b',\ ... 3:'c'} {1: 'a', 2: 'b', 3: 'c'} The rule would be any keyword that starts a block, (class, def, if, elif, with, ... etc.), until an unused (for anything else) colon, would always evaluate to be a single line weather or not it has parentheses or line continuations in it. These can never be multi-line statements as far as I know. The back slash would still be needed in console input. The following inconsistency still bothers me, but I suppose it's an edge case that doesn't cause problems. >>> print r"hello world\" File "", line 1 print r"hello world\" ^ SyntaxError: EOL while scanning single-quoted string >>> print r"hello\ ... world" hello\ world In the first case, it's treated as a continuation character even though it's not at the end of a physical line. So it gives an error. In the second case, its accepted as a continuation character, *and* a '\' character at the same time. (?) Cheers, Ron From python at rcn.com Thu May 3 07:23:39 2007 From: python at rcn.com (Raymond Hettinger) Date: Wed, 2 May 2007 22:23:39 -0700 Subject: [Python-3000] [Python-Dev] Implicit String Concatenation and Octal Literals Was: PEP 30XZ: Simplified Parsing References: <20070502210339.BHU28881@ms09.lnh.mail.rcn.net> <17977.16058.847429.905398@montanaro.dyndns.org> Message-ID: <000401c78d4c$796bfe60$f301a8c0@RaymondLaptop1> > Raymond> I find that style hard to maintain. What is the advantage over > Raymond> multi-line strings? > > Raymond> rows = self.executesql(''' > Raymond> select cities.city, state, country > Raymond> from cities, venues, events, addresses > Raymond> where cities.city like %s > Raymond> and events.active = 1 > Raymond> and venues.address = addresses.id > Raymond> and addresses.city = cities.id > Raymond> and events.venue = venues.id > Raymond> ''', > Raymond> (city,)) [Skip] > Maybe it's just a quirk of how python-mode in Emacs treats multiline strings > that caused me to start doing things this way (I've been doing my embedded > SQL statements this way for several years now), but when I hit LF in an open > multiline string a newline is inserted and the cursor is lined up under the > "r" of "rows", not under the opening quote of the multiline string, and not > where you chose to indent your example. When I use individual strings the > parameters line up where I want them to (the way I lined things up in my > example). At any rate, it's what I'm used to now. I completely understand. Almost any simplification or feature elimination proposal is going to bump-up against, "what we're used to now". Py3k may be our last chance to simplify the language. We have so many special little rules that even advanced users can't keep them all in their head. Certainly, every feature has someone who uses it. But, there is some value to reducing the number of rules, especially if those rules are non-essential (i.e. implicit string concatenation has simple, clear alternatives with multi-line strings or with the plus-operator). Another way to look at it is to ask whether we would consider adding implicit string concatenation if we didn't already have it. I think there would be a chorus of emails against it -- arguing against language bloat and noting that we already have triple-quoted strings, raw-strings, a verbose flag for regexs, backslashes inside multiline strings, the explicit plus-operator, and multi-line expressions delimited by parentheses or brackets. Collectively, that is A LOT of ways to do it. I'm asking this group to give up a minor habit so that we can achieve at least a few simplifications on the way to Py3.0 -- basically, our last chance. Similar thoughts apply to the octal literal PEP. I'm -1 on introducing yet another way to write the literal (and a non-standard one at that). My proposal was simply to eliminate it. The use cases are few and far between (translating C headers and setting unix file permissions). In either case, writing int('0777', 8) suffices. In the latter case, we've already provided clear symbolic alternatives. This simplification of the language would be a freebie (impacting very little code, simplifying the lexer, eliminating a special rule, and eliminating a source of confusion for the young amoung us who do not know about such things). Raymond From greg.ewing at canterbury.ac.nz Thu May 3 08:27:55 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2007 18:27:55 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070502211440.02a386d8@sparrow.telecommunity.com> References: <5.1.1.6.0.20070502113122.04e41350@sparrow.telecommunity.com> <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> <5.1.1.6.0.20070502113122.04e41350@sparrow.telecommunity.com> <5.1.1.6.0.20070502211440.02a386d8@sparrow.telecommunity.com> Message-ID: <463980EB.1070102@canterbury.ac.nz> Phillip J. Eby wrote: > "After you start a transaction on a database, make > sure you turn its logging up all the way." > > Please explain how you would improve one the clarity of that sentence > *without* using the word "after" or any synonyms thereof. I don't object to using the word "after" in the docstring if it helps. Although in this case the intent could be described as "Ensure that all transactions are performed with logging turned up all the way." Whether this is done before or after starting the transaction doesn't seem particularly important. If it *is* important for some reason, that fact should be mentioned in the docstring. The mere presence of an @after decorator doesn't indicate whether it's important. And if it's mentioned in the docstring, there's no need to announce it again in the decorator. > ISTM that your argument is like saying there's no need for C > with all its fancy function parameter names Parameter names help to document the interface of a function, which is something you need to know when you're calling it. You don't need to know the position of a next-method call to use a generic function. I don't doubt that things like @before and @after are handy. But being handy isn't enough for something to get into the Python core. Python has come a long way by providing a few very general mechanisms that can be used in flexible ways. It tends not to go in for gimmicks whose only benefit is to save a line or two of code here and there. I think the same philosophy should be applied to generic functions if we are to get them. -- Greg From greg.ewing at canterbury.ac.nz Thu May 3 08:31:14 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2007 18:31:14 +1200 Subject: [Python-3000] PEP3099 += 'Assignment will not become an operation' In-Reply-To: References: Message-ID: <463981B2.6090704@canterbury.ac.nz> Terry Reedy wrote: > and hence '=' will not become an operator and hence '=' will not become > overloadable. Actually, '=' *is* overloadable in most cases, if you can arrange for a suitably customised object to be used as the namespace being assigned into. About the only case you can't hook is assignment to a local name in a function. -- Greg From greg.ewing at canterbury.ac.nz Thu May 3 08:36:15 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2007 18:36:15 +1200 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <17977.16058.847429.905398@montanaro.dyndns.org> References: <20070502210339.BHU28881@ms09.lnh.mail.rcn.net> <17977.16058.847429.905398@montanaro.dyndns.org> Message-ID: <463982DF.6000700@canterbury.ac.nz> skip at pobox.com wrote: > when I hit LF in an open > multiline string a newline is inserted and the cursor is lined up under the > "r" of "rows", not under the opening quote of the multiline string, and not > where you chose to indent your example. Seems to me that Python actually benefits from an editor which doesn't try to be too clever about auto-formatting. I'm doing most of my Python editing at the moment using BBEdit Lite, which knows nothing at all about Python code -- but it works very well. -- Greg From martin at v.loewis.de Thu May 3 09:03:16 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 03 May 2007 09:03:16 +0200 Subject: [Python-3000] PEP Parade In-Reply-To: References: Message-ID: <46398934.2010700@v.loewis.de> > S 3121 Module Initialization and finalization von L?wis > > I like it. I wish the title were changed to "Extension Module ..." though. Done! Martin From skip at pobox.com Thu May 3 03:45:30 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 2 May 2007 20:45:30 -0500 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <20070502210339.BHU28881@ms09.lnh.mail.rcn.net> References: <20070502210339.BHU28881@ms09.lnh.mail.rcn.net> Message-ID: <17977.16058.847429.905398@montanaro.dyndns.org> Raymond> [Skip] >> I use it all the time. For example, to build up (what I consider to be) >> readable SQL queries: >> >> rows = self.executesql("select cities.city, state, country" >> " from cities, venues, events, addresses" >> " where cities.city like %s" >> " and events.active = 1" >> " and venues.address = addresses.id" >> " and addresses.city = cities.id" >> " and events.venue = venues.id", >> (city,)) Raymond> I find that style hard to maintain. What is the advantage over Raymond> multi-line strings? Raymond> rows = self.executesql(''' Raymond> select cities.city, state, country Raymond> from cities, venues, events, addresses Raymond> where cities.city like %s Raymond> and events.active = 1 Raymond> and venues.address = addresses.id Raymond> and addresses.city = cities.id Raymond> and events.venue = venues.id Raymond> ''', Raymond> (city,)) Maybe it's just a quirk of how python-mode in Emacs treats multiline strings that caused me to start doing things this way (I've been doing my embedded SQL statements this way for several years now), but when I hit LF in an open multiline string a newline is inserted and the cursor is lined up under the "r" of "rows", not under the opening quote of the multiline string, and not where you chose to indent your example. When I use individual strings the parameters line up where I want them to (the way I lined things up in my example). At any rate, it's what I'm used to now. Skip From martin at v.loewis.de Thu May 3 09:19:04 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 03 May 2007 09:19:04 +0200 Subject: [Python-3000] PEP 3120 (Was: PEP Parade) In-Reply-To: References: Message-ID: <46398CE8.2060206@v.loewis.de> > S 3120 Using UTF-8 as the default source encoding von L?wis > > The basic idea seems very reasonable. I expect that the changes to the > parser may be quite significant though. Also, the parser ought to be > weened of C stdio in favor of Python's own I/O library. I wonder if > it's really possible to let the parser read the raw bytes though -- > this would seem to rule out supporting encodings like UTF-16. Somehow > I wonder if it wouldn't be easier if the parser operated on Unicode > input? That way parsing unicode strings (which we must support as all > strings will become unicode) will be simpler. Actually, changes should be fairly minimal. The parser already transforms all input (no matter what source encoding) to UTF-8 before doing the parsing; this has worked well (as all keywords continue to be one-byte characters). The parser also already special-cases UTF-8 as the input encoding, by not putting it through a codec. That can also stay, except that it should now check that any non-ASCII bytes are well-formed UTF-8. Untangling the parser from stdio - sure. I also think it would be desirable to read the whole source into a buffer, rather than applying a line-by-line input. That might be a bigger change, making the tokenizer a multi-stage algorithm: 1. read input into a buffer 2. determine source encoding (looking at a BOM, else a declaration within the first two lines, else default to UTF-8) 3. if the source encoding is not UTF-8, pass it through a codec (decode to string, encode to UTF-8). Otherwise, check that all bytes are really well-formed UTF-8. 4. start parsing As for UTF-16: the lexer currently does not support UTF-16 as a source encoding, as we require an ASCII superset. I'm not sure whether UTF-16 needs to be supported as a source encoding, but with above changes, it would be fairly easy to support, assuming we detect UTF-16 from the BOM (can't use the encoding declaration, because that works only for ASCII supersets). Regards, Martin From talin at acm.org Thu May 3 09:24:30 2007 From: talin at acm.org (Talin) Date: Thu, 03 May 2007 00:24:30 -0700 Subject: [Python-3000] [Python-Dev] Implicit String Concatenation and Octal Literals Was: PEP 30XZ: Simplified Parsing In-Reply-To: <000401c78d4c$796bfe60$f301a8c0@RaymondLaptop1> References: <20070502210339.BHU28881@ms09.lnh.mail.rcn.net> <17977.16058.847429.905398@montanaro.dyndns.org> <000401c78d4c$796bfe60$f301a8c0@RaymondLaptop1> Message-ID: <46398E2E.1010604@acm.org> Raymond Hettinger wrote: >> Raymond> I find that style hard to maintain. What is the advantage over >> Raymond> multi-line strings? >> >> Raymond> rows = self.executesql(''' >> Raymond> select cities.city, state, country >> Raymond> from cities, venues, events, addresses >> Raymond> where cities.city like %s >> Raymond> and events.active = 1 >> Raymond> and venues.address = addresses.id >> Raymond> and addresses.city = cities.id >> Raymond> and events.venue = venues.id >> Raymond> ''', >> Raymond> (city,)) > > [Skip] >> Maybe it's just a quirk of how python-mode in Emacs treats multiline strings >> that caused me to start doing things this way (I've been doing my embedded >> SQL statements this way for several years now), but when I hit LF in an open >> multiline string a newline is inserted and the cursor is lined up under the >> "r" of "rows", not under the opening quote of the multiline string, and not >> where you chose to indent your example. When I use individual strings the >> parameters line up where I want them to (the way I lined things up in my >> example). At any rate, it's what I'm used to now. > > > I completely understand. Almost any simplification or feature elimination > proposal is going to bump-up against, "what we're used to now". > Py3k may be our last chance to simplify the language. We have so many > special little rules that even advanced users can't keep them > all in their head. Certainly, every feature has someone who uses it. > But, there is some value to reducing the number of rules, especially > if those rules are non-essential (i.e. implicit string concatenation has > simple, clear alternatives with multi-line strings or with the plus-operator). > > Another way to look at it is to ask whether we would consider > adding implicit string concatenation if we didn't already have it. > I think there would be a chorus of emails against it -- arguing > against language bloat and noting that we already have triple-quoted > strings, raw-strings, a verbose flag for regexs, backslashes inside multiline > strings, the explicit plus-operator, and multi-line expressions delimited > by parentheses or brackets. Collectively, that is A LOT of ways to do it. > > I'm asking this group to give up a minor habit so that we can achieve > at least a few simplifications on the way to Py3.0 -- basically, our last chance. > > Similar thoughts apply to the octal literal PEP. I'm -1 on introducing > yet another way to write the literal (and a non-standard one at that). > My proposal was simply to eliminate it. The use cases are few and > far between (translating C headers and setting unix file permissions). > In either case, writing int('0777', 8) suffices. In the latter case, we've > already provided clear symbolic alternatives. This simplification of the > language would be a freebie (impacting very little code, simplifying the > lexer, eliminating a special rule, and eliminating a source of confusion > for the young amoung us who do not know about such things). My counter argument is that these simplifications aren't simplifying much - that is, the removals don't cascade and cause other simplifications. The grammar file, for example, won't look dramatically different if these changes are made. The simplification argument seems weak to me because the change in overall language complexity is very small, whereas the inconvenience caused, while not huge, is at least significant. That being said, line continuation is the only one I really care about. And I would happily give up backslashes in exchange for a more sane method of continuing lines. Either way avoids "spurious" grouping operators which IMHO don't make for easier-to-read code. -- Talin From rasky at develer.com Thu May 3 09:25:44 2007 From: rasky at develer.com (Giovanni Bajo) Date: Thu, 03 May 2007 09:25:44 +0200 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> Message-ID: On 01/05/2007 18.09, Phillip J. Eby wrote: >> The alternative is to code the automatic finalization steps using >> weakref callbacks. For those used to using __del__, it takes a little >> while to learn the idiom but essentially the technique is hold a proxy >> or ref with a callback to a boundmethod for finalization: >> self.resource = resource = CreateResource() >> self.callbacks.append(proxy(resource, resource.closedown)) >> In this manner, all of the object's resources can be freed automatically >> when the object is collected. Note, that the callbacks only bind >> the resource object and not client object, so the client object >> can already have been collected and the teardown code can be run >> without risk of resurrecting the client (with a possibly invalid state). > > I'm a bit confused about the above. My understanding is that in order for > a weakref's callback to be invoked, the weakref itself *must still be > live*. That means that if 'self' in your example above is collected, then > the weakref no longer exists, so the closedown won't be called. Yes, but as far as I understand it, the GC does special care to ensure that the callback of a weakref that is *not* part of a cyclic trash being collected is always called. See this comment in gcmodule.c: * OTOH, if wr isn't part of CT, we should invoke the callback: the * weakref outlived the trash. Note that since wr isn't CT in this * case, its callback can't be CT either -- wr acted as an external * root to this generation, and therefore its callback did too. So * nothing in CT is reachable from the callback either, so it's hard * to imagine how calling it later could create a problem for us. wr * is moved to wrcb_to_call in this case. I might be wrong about the inners of GC, but I have used the weakref idiom many times and it always appeared to be working. > In principle I'm in favor of ditching __del__, as long as there's actually > a viable technique for doing so. My own experience has been that setting > up a simple mechanism to replace it (and that actually works) is really > difficult, because you have to find some place for the weakref itself to > live, which usually means a global dictionary or something of that > sort. Others suggested that such a framework could be prepared, but I have not seen one yet. It would be nice if the gc or weakref modules grew a facility to > make it easier to register finalization callbacks, and could optionally > check whether you were registering a callback that referenced the thing you > were tying the callback's life to. That'd be absolutely great! OTOH, the GC could possibly re-verify such assertion every time it kicks in (when a special debug flag is activated). -- Giovanni Bajo Develer S.r.l. http://www.develer.com From tjreedy at udel.edu Thu May 3 10:17:15 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 3 May 2007 04:17:15 -0400 Subject: [Python-3000] PEP3099 += 'Assignment will not become anoperation' References: <463981B2.6090704@canterbury.ac.nz> Message-ID: "Greg Ewing" wrote in message news:463981B2.6090704 at canterbury.ac.nz... | Terry Reedy wrote: | > and hence '=' will not become an operator and hence '=' will not become | > overloadable. | | Actually, '=' *is* overloadable in most cases, It is not overloadable in the sense I meant, and in the sense people occasionally request, which is to have '=' be an *operation* that invokes a special method such as __assign__, just as the '+' operation invokes '__add__'. | you can arrange for a suitably customised object | to be used as the namespace being assigned into. | About the only case you can't hook is assignment | to a local name in a function. I mentioned purse classes in the appropriate place -- c.l.p. I cannot think of any way to make plain assignment statements ('a = object') at module scope do anything other than bind an object to a name in the global namespace. Back to my original point: people occasionally ask that assignment statements become assignment expressions, as in C, by making '=' an operation with an overloadable special method. Guido has consistently said no. This came up again today. Since this is a much more frequent request than some of the items already in 3099, I think it should be added their. Terry Jan Reedy From walter at livinglogic.de Thu May 3 12:01:48 2007 From: walter at livinglogic.de (=?UTF-8?B?V2FsdGVyIETDtnJ3YWxk?=) Date: Thu, 03 May 2007 12:01:48 +0200 Subject: [Python-3000] [Python-checkins] r55079 - in python/branches/py3k-struni/Lib: [many files] In-Reply-To: <20070502191059.A48491E4010@bag.python.org> References: <20070502191059.A48491E4010@bag.python.org> Message-ID: <4639B30C.10307@livinglogic.de> guido.van.rossum wrote: > Author: guido.van.rossum > Date: Wed May 2 21:09:54 2007 > New Revision: 55079 > > Modified: > Log: > [...] > Rip out all the u"..." literals and calls to unicode(). That might be one of the largest diffs in Python's history. ;) Some of the changes lead to strange code like isinstance(foo, (str, str)) Below are the strange spots I noticed at first glance. I'm sure I missed a few. Servus, Walter > [...] > Modified: python/branches/py3k-struni/Lib/copy.py > ============================================================================== > --- python/branches/py3k-struni/Lib/copy.py (original) > +++ python/branches/py3k-struni/Lib/copy.py Wed May 2 21:09:54 2007 > @@ -186,7 +186,7 @@ > pass > d[str] = _deepcopy_atomic > try: > - d[unicode] = _deepcopy_atomic > + d[str] = _deepcopy_atomic > except NameError: > pass The try:except: is unnecessary now. > try: > > Modified: python/branches/py3k-struni/Lib/ctypes/__init__.py > ============================================================================== > --- python/branches/py3k-struni/Lib/ctypes/__init__.py (original) > +++ python/branches/py3k-struni/Lib/ctypes/__init__.py Wed May 2 21:09:54 2007 > @@ -59,7 +59,7 @@ > create_string_buffer(anInteger) -> character array > create_string_buffer(aString, anInteger) -> character array > """ > - if isinstance(init, (str, unicode)): > + if isinstance(init, (str, str)): > if size is None: > size = len(init)+1 > buftype = c_char * size > @@ -281,7 +281,7 @@ > create_unicode_buffer(anInteger) -> character array > create_unicode_buffer(aString, anInteger) -> character array > """ > - if isinstance(init, (str, unicode)): > + if isinstance(init, (str, str)): > if size is None: > size = len(init)+1 > buftype = c_wchar * size This could be simplyfied to: if isinstance(init, str): > Modified: python/branches/py3k-struni/Lib/distutils/command/bdist_wininst.py > ============================================================================== > --- python/branches/py3k-struni/Lib/distutils/command/bdist_wininst.py (original) > +++ python/branches/py3k-struni/Lib/distutils/command/bdist_wininst.py Wed May 2 21:09:54 2007 > @@ -247,11 +247,11 @@ > > # Convert cfgdata from unicode to ascii, mbcs encoded > try: > - unicode > + str > except NameError: > pass > else: > - if isinstance(cfgdata, unicode): > + if isinstance(cfgdata, str): > cfgdata = cfgdata.encode("mbcs") The try:except: is again unnecessary. > Modified: python/branches/py3k-struni/Lib/doctest.py > ============================================================================== > --- python/branches/py3k-struni/Lib/doctest.py (original) > +++ python/branches/py3k-struni/Lib/doctest.py Wed May 2 21:09:54 2007 > @@ -196,7 +196,7 @@ > """ > if inspect.ismodule(module): > return module > - elif isinstance(module, (str, unicode)): > + elif isinstance(module, (str, str)): -> elif isinstance(module, str): > Modified: python/branches/py3k-struni/Lib/encodings/idna.py > ============================================================================== > --- python/branches/py3k-struni/Lib/encodings/idna.py (original) > +++ python/branches/py3k-struni/Lib/encodings/idna.py Wed May 2 21:09:54 2007 > @@ -4,11 +4,11 @@ > from unicodedata import ucd_3_2_0 as unicodedata > > # IDNA section 3.1 > -dots = re.compile(u"[\u002E\u3002\uFF0E\uFF61]") > +dots = re.compile("[\u002E\u3002\uFF0E\uFF61]") > > # IDNA section 5 > ace_prefix = "xn--" > -uace_prefix = unicode(ace_prefix, "ascii") > +uace_prefix = str(ace_prefix, "ascii") This looks unnecessary to me. > Modified: python/branches/py3k-struni/Lib/idlelib/PyParse.py > ============================================================================== > --- python/branches/py3k-struni/Lib/idlelib/PyParse.py (original) > +++ python/branches/py3k-struni/Lib/idlelib/PyParse.py Wed May 2 21:09:54 2007 > @@ -105,7 +105,7 @@ > del ch > > try: > - UnicodeType = type(unicode("")) > + UnicodeType = type(str("")) > except NameError: > UnicodeType = None This should probably be: UnicodeType = str (or the code could directly use str) > Modified: python/branches/py3k-struni/Lib/lib-tk/Tkinter.py > ============================================================================== > --- python/branches/py3k-struni/Lib/lib-tk/Tkinter.py (original) > +++ python/branches/py3k-struni/Lib/lib-tk/Tkinter.py Wed May 2 21:09:54 2007 > @@ -3736,7 +3736,7 @@ > text = "This is Tcl/Tk version %s" % TclVersion > if TclVersion >= 8.1: > try: > - text = text + unicode("\nThis should be a cedilla: \347", > + text = text + str("\nThis should be a cedilla: \347", > "iso-8859-1") Better: text = text + "\nThis should be a cedilla: \xe7" > Modified: python/branches/py3k-struni/Lib/pickle.py > ============================================================================== > --- python/branches/py3k-struni/Lib/pickle.py (original) > +++ python/branches/py3k-struni/Lib/pickle.py Wed May 2 21:09:54 2007 > @@ -523,22 +523,22 @@ > if StringType == UnicodeType: > # This is true for Jython What's happening here? > [...] > Modified: python/branches/py3k-struni/Lib/plat-mac/EasyDialogs.py > ============================================================================== > --- python/branches/py3k-struni/Lib/plat-mac/EasyDialogs.py (original) > +++ python/branches/py3k-struni/Lib/plat-mac/EasyDialogs.py Wed May 2 21:09:54 2007 > @@ -662,7 +662,7 @@ > return tpwanted(rr.selection[0]) > if issubclass(tpwanted, str): > return tpwanted(rr.selection_fsr[0].as_pathname()) > - if issubclass(tpwanted, unicode): > + if issubclass(tpwanted, str): > return tpwanted(rr.selection_fsr[0].as_pathname(), 'utf8') > raise TypeError, "Unknown value for argument 'wanted': %s" % repr(tpwanted) > > @@ -713,7 +713,7 @@ > raise TypeError, "Cannot pass wanted=FSRef to AskFileForSave" > if issubclass(tpwanted, Carbon.File.FSSpec): > return tpwanted(rr.selection[0]) > - if issubclass(tpwanted, (str, unicode)): > + if issubclass(tpwanted, (str, str)): -> if issubclass(tpwanted, str): > if sys.platform == 'mac': > fullpath = rr.selection[0].as_pathname() > else: > @@ -722,10 +722,10 @@ > pardir_fss = Carbon.File.FSSpec((vrefnum, dirid, '')) > pardir_fsr = Carbon.File.FSRef(pardir_fss) > pardir_path = pardir_fsr.FSRefMakePath() # This is utf-8 > - name_utf8 = unicode(name, 'macroman').encode('utf8') > + name_utf8 = str(name, 'macroman').encode('utf8') > fullpath = os.path.join(pardir_path, name_utf8) > - if issubclass(tpwanted, unicode): > - return unicode(fullpath, 'utf8') > + if issubclass(tpwanted, str): > + return str(fullpath, 'utf8') > return tpwanted(fullpath) > raise TypeError, "Unknown value for argument 'wanted': %s" % repr(tpwanted) > > @@ -775,7 +775,7 @@ > return tpwanted(rr.selection[0]) > if issubclass(tpwanted, str): > return tpwanted(rr.selection_fsr[0].as_pathname()) > - if issubclass(tpwanted, unicode): > + if issubclass(tpwanted, str): This does the same check twice. > Modified: python/branches/py3k-struni/Lib/plat-mac/plistlib.py > ============================================================================== > --- python/branches/py3k-struni/Lib/plat-mac/plistlib.py (original) > +++ python/branches/py3k-struni/Lib/plat-mac/plistlib.py Wed May 2 21:09:54 2007 > @@ -70,7 +70,7 @@ > usually is a dictionary). > """ > didOpen = 0 > - if isinstance(pathOrFile, (str, unicode)): > + if isinstance(pathOrFile, (str, str)): -> if isinstance(pathOrFile, str): > pathOrFile = open(pathOrFile) > didOpen = 1 > p = PlistParser() > @@ -85,7 +85,7 @@ > file name or a (writable) file object. > """ > didOpen = 0 > - if isinstance(pathOrFile, (str, unicode)): > + if isinstance(pathOrFile, (str, str)): -> if isinstance(pathOrFile, str): > pathOrFile = open(pathOrFile, "w") > didOpen = 1 > writer = PlistWriter(pathOrFile) > @@ -231,7 +231,7 @@ > DumbXMLWriter.__init__(self, file, indentLevel, indent) > > def writeValue(self, value): > - if isinstance(value, (str, unicode)): > + if isinstance(value, (str, str)): -> if isinstance(value, str): > self.simpleElement("string", value) > elif isinstance(value, bool): > # must switch for bool before int, as bool is a > @@ -270,7 +270,7 @@ > self.beginElement("dict") > items = sorted(d.items()) > for key, value in items: > - if not isinstance(key, (str, unicode)): > + if not isinstance(key, (str, str)): -> if not isinstance(key, str): > Modified: python/branches/py3k-struni/Lib/sqlite3/test/factory.py > ============================================================================== > --- python/branches/py3k-struni/Lib/sqlite3/test/factory.py (original) > +++ python/branches/py3k-struni/Lib/sqlite3/test/factory.py Wed May 2 21:09:54 2007 > @@ -139,31 +139,31 @@ > self.con = sqlite.connect(":memory:") > > def CheckUnicode(self): > - austria = unicode("?sterreich", "latin1") > + austria = str("?sterreich", "latin1") > row = self.con.execute("select ?", (austria,)).fetchone() > - self.failUnless(type(row[0]) == unicode, "type of row[0] must be unicode") > + self.failUnless(type(row[0]) == str, "type of row[0] must be unicode") > > def CheckString(self): > self.con.text_factory = str > - austria = unicode("?sterreich", "latin1") > + austria = str("?sterreich", "latin1") > row = self.con.execute("select ?", (austria,)).fetchone() > self.failUnless(type(row[0]) == str, "type of row[0] must be str") > self.failUnless(row[0] == austria.encode("utf-8"), "column must equal original data in UTF-8") It looks like both those test do the same thing now. > Modified: python/branches/py3k-struni/Lib/tarfile.py > ============================================================================== > --- python/branches/py3k-struni/Lib/tarfile.py (original) > +++ python/branches/py3k-struni/Lib/tarfile.py Wed May 2 21:09:54 2007 > @@ -1031,7 +1031,7 @@ > for name, digits in (("uid", 8), ("gid", 8), ("size", 12), ("mtime", 12)): > val = info[name] > if not 0 <= val < 8 ** (digits - 1) or isinstance(val, float): > - pax_headers[name] = unicode(val) > + pax_headers[name] = str(val) > info[name] = 0 > > if pax_headers: > @@ -1054,12 +1054,12 @@ > > @staticmethod > def _to_unicode(value, encoding): > - if isinstance(value, unicode): > + if isinstance(value, str): > return value > elif isinstance(value, (int, float)): > - return unicode(value) > + return str(value) > elif isinstance(value, str): > - return unicode(value, encoding) > + return str(value, encoding) > else: > raise ValueError("unable to convert to unicode: %r" % value) Here the same test is done twice too. > Modified: python/branches/py3k-struni/Lib/test/pickletester.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/pickletester.py (original) > +++ python/branches/py3k-struni/Lib/test/pickletester.py Wed May 2 21:09:54 2007 > @@ -484,8 +484,8 @@ > > if have_unicode: > def test_unicode(self): > - endcases = [unicode(''), unicode('<\\u>'), unicode('<\\\u1234>'), > - unicode('<\n>'), unicode('<\\>')] > + endcases = [str(''), str('<\\u>'), str('<\\\u1234>'), > + str('<\n>'), str('<\\>')] The str() call is unnecessary. > Modified: python/branches/py3k-struni/Lib/test/string_tests.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/string_tests.py (original) > +++ python/branches/py3k-struni/Lib/test/string_tests.py Wed May 2 21:09:54 2007 > @@ -589,7 +589,7 @@ > self.checkequal(['a']*19 + ['a '], aaa, 'split', None, 19) > > # mixed use of str and unicode > - self.checkequal([u'a', u'b', u'c d'], 'a b c d', 'split', u' ', 2) > + self.checkequal(['a', 'b', 'c d'], 'a b c d', 'split', ' ', 2) > > def test_additional_rsplit(self): > self.checkequal(['this', 'is', 'the', 'rsplit', 'function'], > @@ -622,7 +622,7 @@ > self.checkequal([' a a'] + ['a']*18, aaa, 'rsplit', None, 18) > > # mixed use of str and unicode > - self.checkequal([u'a b', u'c', u'd'], 'a b c d', 'rsplit', u' ', 2) > + self.checkequal(['a b', 'c', 'd'], 'a b c d', 'rsplit', ' ', 2) > > def test_strip(self): > self.checkequal('hello', ' hello ', 'strip') > @@ -644,14 +644,14 @@ > > # strip/lstrip/rstrip with unicode arg > if test_support.have_unicode: > - self.checkequal(unicode('hello', 'ascii'), 'xyzzyhelloxyzzy', > - 'strip', unicode('xyz', 'ascii')) > - self.checkequal(unicode('helloxyzzy', 'ascii'), 'xyzzyhelloxyzzy', > - 'lstrip', unicode('xyz', 'ascii')) > - self.checkequal(unicode('xyzzyhello', 'ascii'), 'xyzzyhelloxyzzy', > - 'rstrip', unicode('xyz', 'ascii')) > - self.checkequal(unicode('hello', 'ascii'), 'hello', > - 'strip', unicode('xyz', 'ascii')) > + self.checkequal(str('hello', 'ascii'), 'xyzzyhelloxyzzy', > + 'strip', str('xyz', 'ascii')) > + self.checkequal(str('helloxyzzy', 'ascii'), 'xyzzyhelloxyzzy', > + 'lstrip', str('xyz', 'ascii')) > + self.checkequal(str('xyzzyhello', 'ascii'), 'xyzzyhelloxyzzy', > + 'rstrip', str('xyz', 'ascii')) > + self.checkequal(str('hello', 'ascii'), 'hello', > + 'strip', str('xyz', 'ascii')) The str() call is unnecessary. > self.checkraises(TypeError, 'hello', 'strip', 42, 42) > self.checkraises(TypeError, 'hello', 'lstrip', 42, 42) > @@ -908,13 +908,13 @@ > self.checkequal(False, '', '__contains__', 'asdf') # vereq('asdf' in '', False) > > def test_subscript(self): > - self.checkequal(u'a', 'abc', '__getitem__', 0) > - self.checkequal(u'c', 'abc', '__getitem__', -1) > - self.checkequal(u'a', 'abc', '__getitem__', 0) > - self.checkequal(u'abc', 'abc', '__getitem__', slice(0, 3)) > - self.checkequal(u'abc', 'abc', '__getitem__', slice(0, 1000)) > - self.checkequal(u'a', 'abc', '__getitem__', slice(0, 1)) > - self.checkequal(u'', 'abc', '__getitem__', slice(0, 0)) > + self.checkequal('a', 'abc', '__getitem__', 0) > + self.checkequal('c', 'abc', '__getitem__', -1) > + self.checkequal('a', 'abc', '__getitem__', 0) > + self.checkequal('abc', 'abc', '__getitem__', slice(0, 3)) > + self.checkequal('abc', 'abc', '__getitem__', slice(0, 1000)) > + self.checkequal('a', 'abc', '__getitem__', slice(0, 1)) > + self.checkequal('', 'abc', '__getitem__', slice(0, 0)) > # FIXME What about negative indices? This is handled differently by [] and __getitem__(slice) > > self.checkraises(TypeError, 'abc', '__getitem__', 'def') > @@ -957,11 +957,11 @@ > self.checkequal('abc', 'a', 'join', ('abc',)) > self.checkequal('z', 'a', 'join', UserList(['z'])) > if test_support.have_unicode: > - self.checkequal(unicode('a.b.c'), unicode('.'), 'join', ['a', 'b', 'c']) > - self.checkequal(unicode('a.b.c'), '.', 'join', [unicode('a'), 'b', 'c']) > - self.checkequal(unicode('a.b.c'), '.', 'join', ['a', unicode('b'), 'c']) > - self.checkequal(unicode('a.b.c'), '.', 'join', ['a', 'b', unicode('c')]) > - self.checkraises(TypeError, '.', 'join', ['a', unicode('b'), 3]) > + self.checkequal(str('a.b.c'), str('.'), 'join', ['a', 'b', 'c']) > + self.checkequal(str('a.b.c'), '.', 'join', [str('a'), 'b', 'c']) > + self.checkequal(str('a.b.c'), '.', 'join', ['a', str('b'), 'c']) > + self.checkequal(str('a.b.c'), '.', 'join', ['a', 'b', str('c')]) > + self.checkraises(TypeError, '.', 'join', ['a', str('b'), 3]) The str() call is unnecessary. > Modified: python/branches/py3k-struni/Lib/test/test_array.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_array.py (original) > +++ python/branches/py3k-struni/Lib/test/test_array.py Wed May 2 21:09:54 2007 > @@ -747,7 +747,7 @@ > > def test_nounicode(self): > a = array.array(self.typecode, self.example) > - self.assertRaises(ValueError, a.fromunicode, unicode('')) > + self.assertRaises(ValueError, a.fromunicode, str('')) > self.assertRaises(ValueError, a.tounicode) Should the method fromunicode() and tounicode() be renamed? > tests.append(CharacterTest) > @@ -755,27 +755,27 @@ > if test_support.have_unicode: > class UnicodeTest(StringTest): > typecode = 'u' > - example = unicode(r'\x01\u263a\x00\ufeff', 'unicode-escape') > - smallerexample = unicode(r'\x01\u263a\x00\ufefe', 'unicode-escape') > - biggerexample = unicode(r'\x01\u263a\x01\ufeff', 'unicode-escape') > - outside = unicode('\x33') > + example = str(r'\x01\u263a\x00\ufeff', 'unicode-escape') > + smallerexample = str(r'\x01\u263a\x00\ufefe', 'unicode-escape') > + biggerexample = str(r'\x01\u263a\x01\ufeff', 'unicode-escape') > + outside = str('\x33') > minitemsize = 2 > > def test_unicode(self): > - self.assertRaises(TypeError, array.array, 'b', unicode('foo', 'ascii')) > + self.assertRaises(TypeError, array.array, 'b', str('foo', 'ascii')) > - a = array.array('u', unicode(r'\xa0\xc2\u1234', 'unicode-escape')) > - a.fromunicode(unicode(' ', 'ascii')) > - a.fromunicode(unicode('', 'ascii')) > - a.fromunicode(unicode('', 'ascii')) > - a.fromunicode(unicode(r'\x11abc\xff\u1234', 'unicode-escape')) > + a = array.array('u', str(r'\xa0\xc2\u1234', 'unicode-escape')) > + a.fromunicode(str(' ', 'ascii')) > + a.fromunicode(str('', 'ascii')) > + a.fromunicode(str('', 'ascii')) > + a.fromunicode(str(r'\x11abc\xff\u1234', 'unicode-escape')) > s = a.tounicode() > self.assertEqual( > s, > - unicode(r'\xa0\xc2\u1234 \x11abc\xff\u1234', 'unicode-escape') > + str(r'\xa0\xc2\u1234 \x11abc\xff\u1234', 'unicode-escape') > ) > > - s = unicode(r'\x00="\'a\\b\x80\xff\u0000\u0001\u1234', 'unicode-escape') > + s = str(r'\x00="\'a\\b\x80\xff\u0000\u0001\u1234', 'unicode-escape') > a = array.array('u', s) > self.assertEqual( > repr(a), The str(..., 'ascii') call is unnecessary. > Modified: python/branches/py3k-struni/Lib/test/test_binascii.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_binascii.py (original) > +++ python/branches/py3k-struni/Lib/test/test_binascii.py Wed May 2 21:09:54 2007 > @@ -124,7 +124,7 @@ > > # Verify the treatment of Unicode strings > if test_support.have_unicode: > - self.assertEqual(binascii.hexlify(unicode('a', 'ascii')), '61') > + self.assertEqual(binascii.hexlify(str('a', 'ascii')), '61') The str() call is unnecessary. > Modified: python/branches/py3k-struni/Lib/test/test_bool.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_bool.py (original) > +++ python/branches/py3k-struni/Lib/test/test_bool.py Wed May 2 21:09:54 2007 > @@ -208,28 +208,28 @@ > self.assertIs("xyz".startswith("z"), False) > > if test_support.have_unicode: > - self.assertIs(unicode("xyz", 'ascii').endswith(unicode("z", 'ascii')), True) > - self.assertIs(unicode("xyz", 'ascii').endswith(unicode("x", 'ascii')), False) > - self.assertIs(unicode("xyz0123", 'ascii').isalnum(), True) > - self.assertIs(unicode("@#$%", 'ascii').isalnum(), False) > - self.assertIs(unicode("xyz", 'ascii').isalpha(), True) > - self.assertIs(unicode("@#$%", 'ascii').isalpha(), False) > - self.assertIs(unicode("0123", 'ascii').isdecimal(), True) > - self.assertIs(unicode("xyz", 'ascii').isdecimal(), False) > - self.assertIs(unicode("0123", 'ascii').isdigit(), True) > - self.assertIs(unicode("xyz", 'ascii').isdigit(), False) > - self.assertIs(unicode("xyz", 'ascii').islower(), True) > - self.assertIs(unicode("XYZ", 'ascii').islower(), False) > - self.assertIs(unicode("0123", 'ascii').isnumeric(), True) > - self.assertIs(unicode("xyz", 'ascii').isnumeric(), False) > - self.assertIs(unicode(" ", 'ascii').isspace(), True) > - self.assertIs(unicode("XYZ", 'ascii').isspace(), False) > - self.assertIs(unicode("X", 'ascii').istitle(), True) > - self.assertIs(unicode("x", 'ascii').istitle(), False) > - self.assertIs(unicode("XYZ", 'ascii').isupper(), True) > - self.assertIs(unicode("xyz", 'ascii').isupper(), False) > - self.assertIs(unicode("xyz", 'ascii').startswith(unicode("x", 'ascii')), True) > - self.assertIs(unicode("xyz", 'ascii').startswith(unicode("z", 'ascii')), False) > + self.assertIs(str("xyz", 'ascii').endswith(str("z", 'ascii')), True) > + self.assertIs(str("xyz", 'ascii').endswith(str("x", 'ascii')), False) > + self.assertIs(str("xyz0123", 'ascii').isalnum(), True) > + self.assertIs(str("@#$%", 'ascii').isalnum(), False) > + self.assertIs(str("xyz", 'ascii').isalpha(), True) > + self.assertIs(str("@#$%", 'ascii').isalpha(), False) > + self.assertIs(str("0123", 'ascii').isdecimal(), True) > + self.assertIs(str("xyz", 'ascii').isdecimal(), False) > + self.assertIs(str("0123", 'ascii').isdigit(), True) > + self.assertIs(str("xyz", 'ascii').isdigit(), False) > + self.assertIs(str("xyz", 'ascii').islower(), True) > + self.assertIs(str("XYZ", 'ascii').islower(), False) > + self.assertIs(str("0123", 'ascii').isnumeric(), True) > + self.assertIs(str("xyz", 'ascii').isnumeric(), False) > + self.assertIs(str(" ", 'ascii').isspace(), True) > + self.assertIs(str("XYZ", 'ascii').isspace(), False) > + self.assertIs(str("X", 'ascii').istitle(), True) > + self.assertIs(str("x", 'ascii').istitle(), False) > + self.assertIs(str("XYZ", 'ascii').isupper(), True) > + self.assertIs(str("xyz", 'ascii').isupper(), False) > + self.assertIs(str("xyz", 'ascii').startswith(str("x", 'ascii')), True) > + self.assertIs(str("xyz", 'ascii').startswith(str("z", 'ascii')), False) These tests can IMHO simply be dropped. > Modified: python/branches/py3k-struni/Lib/test/test_builtin.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_builtin.py (original) > +++ python/branches/py3k-struni/Lib/test/test_builtin.py Wed May 2 21:09:54 2007 > @@ -74,22 +74,22 @@ > ] > if have_unicode: > L += [ > - (unicode('0'), 0), > - (unicode('1'), 1), > - (unicode('9'), 9), > - (unicode('10'), 10), > - (unicode('99'), 99), > - (unicode('100'), 100), > - (unicode('314'), 314), > - (unicode(' 314'), 314), > - (unicode(b'\u0663\u0661\u0664 ','raw-unicode-escape'), 314), > - (unicode(' \t\t 314 \t\t '), 314), > - (unicode(' 1x'), ValueError), > - (unicode(' 1 '), 1), > - (unicode(' 1\02 '), ValueError), > - (unicode(''), ValueError), > - (unicode(' '), ValueError), > - (unicode(' \t\t '), ValueError), > + (str('0'), 0), > + (str('1'), 1), > + (str('9'), 9), > + (str('10'), 10), > + (str('99'), 99), > + (str('100'), 100), > + (str('314'), 314), > + (str(' 314'), 314), > + (str(b'\u0663\u0661\u0664 ','raw-unicode-escape'), 314), > + (str(' \t\t 314 \t\t '), 314), > + (str(' 1x'), ValueError), > + (str(' 1 '), 1), > + (str(' 1\02 '), ValueError), > + (str(''), ValueError), > + (str(' '), ValueError), > + (str(' \t\t '), ValueError), > (unichr(0x200), ValueError), > ] Most of these tests can probably be dropped too. Probably any test that checks have_unicode should be looked at. > Modified: python/branches/py3k-struni/Lib/test/test_cfgparser.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_cfgparser.py (original) > +++ python/branches/py3k-struni/Lib/test/test_cfgparser.py Wed May 2 21:09:54 2007 > @@ -248,12 +248,12 @@ > cf.set("sect", "option2", "splat") > cf.set("sect", "option2", mystr("splat")) > try: > - unicode > + str > except NameError: > pass > else: > - cf.set("sect", "option1", unicode("splat")) > - cf.set("sect", "option2", unicode("splat")) > + cf.set("sect", "option1", str("splat")) > + cf.set("sect", "option2", str("splat")) The try:except: and the str() call is unnecessary. > Modified: python/branches/py3k-struni/Lib/test/test_charmapcodec.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_charmapcodec.py (original) > +++ python/branches/py3k-struni/Lib/test/test_charmapcodec.py Wed May 2 21:09:54 2007 > @@ -27,27 +27,27 @@ > > class CharmapCodecTest(unittest.TestCase): > def test_constructorx(self): > - self.assertEquals(unicode('abc', codecname), u'abc') > - self.assertEquals(unicode('xdef', codecname), u'abcdef') > - self.assertEquals(unicode('defx', codecname), u'defabc') > - self.assertEquals(unicode('dxf', codecname), u'dabcf') > - self.assertEquals(unicode('dxfx', codecname), u'dabcfabc') > + self.assertEquals(str('abc', codecname), 'abc') > + self.assertEquals(str('xdef', codecname), 'abcdef') > + self.assertEquals(str('defx', codecname), 'defabc') > + self.assertEquals(str('dxf', codecname), 'dabcf') > + self.assertEquals(str('dxfx', codecname), 'dabcfabc') > > def test_encodex(self): > - self.assertEquals(u'abc'.encode(codecname), 'abc') > - self.assertEquals(u'xdef'.encode(codecname), 'abcdef') > - self.assertEquals(u'defx'.encode(codecname), 'defabc') > - self.assertEquals(u'dxf'.encode(codecname), 'dabcf') > - self.assertEquals(u'dxfx'.encode(codecname), 'dabcfabc') > + self.assertEquals('abc'.encode(codecname), 'abc') > + self.assertEquals('xdef'.encode(codecname), 'abcdef') > + self.assertEquals('defx'.encode(codecname), 'defabc') > + self.assertEquals('dxf'.encode(codecname), 'dabcf') > + self.assertEquals('dxfx'.encode(codecname), 'dabcfabc') > > def test_constructory(self): > - self.assertEquals(unicode('ydef', codecname), u'def') > - self.assertEquals(unicode('defy', codecname), u'def') > - self.assertEquals(unicode('dyf', codecname), u'df') > - self.assertEquals(unicode('dyfy', codecname), u'df') > + self.assertEquals(str('ydef', codecname), 'def') > + self.assertEquals(str('defy', codecname), 'def') > + self.assertEquals(str('dyf', codecname), 'df') > + self.assertEquals(str('dyfy', codecname), 'df') These should probably be b'...' constants. > Modified: python/branches/py3k-struni/Lib/test/test_complex.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_complex.py (original) > +++ python/branches/py3k-struni/Lib/test/test_complex.py Wed May 2 21:09:54 2007 > @@ -227,7 +227,7 @@ > > self.assertEqual(complex(" 3.14+J "), 3.14+1j) > if test_support.have_unicode: > - self.assertEqual(complex(unicode(" 3.14+J ")), 3.14+1j) > + self.assertEqual(complex(str(" 3.14+J ")), 3.14+1j) > > # SF bug 543840: complex(string) accepts strings with \0 > # Fixed in 2.3. > @@ -251,8 +251,8 @@ > self.assertRaises(ValueError, complex, "1+(2j)") > self.assertRaises(ValueError, complex, "(1+2j)123") > if test_support.have_unicode: > - self.assertRaises(ValueError, complex, unicode("1"*500)) > - self.assertRaises(ValueError, complex, unicode("x")) > + self.assertRaises(ValueError, complex, str("1"*500)) > + self.assertRaises(ValueError, complex, str("x")) The str() calls are unnecessary. > Modified: python/branches/py3k-struni/Lib/test/test_contains.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_contains.py (original) > +++ python/branches/py3k-struni/Lib/test/test_contains.py Wed May 2 21:09:54 2007 > @@ -59,31 +59,31 @@ > > # Test char in Unicode > > - check('c' in unicode('abc'), "'c' not in u'abc'") > - check('d' not in unicode('abc'), "'d' in u'abc'") > + check('c' in str('abc'), "'c' not in u'abc'") > + check('d' not in str('abc'), "'d' in u'abc'") > > - check('' in unicode(''), "'' not in u''") > - check(unicode('') in '', "u'' not in ''") > - check(unicode('') in unicode(''), "u'' not in u''") > - check('' in unicode('abc'), "'' not in u'abc'") > - check(unicode('') in 'abc', "u'' not in 'abc'") > - check(unicode('') in unicode('abc'), "u'' not in u'abc'") > + check('' in str(''), "'' not in u''") > + check(str('') in '', "u'' not in ''") > + check(str('') in str(''), "u'' not in u''") > + check('' in str('abc'), "'' not in u'abc'") > + check(str('') in 'abc', "u'' not in 'abc'") > + check(str('') in str('abc'), "u'' not in u'abc'") > > try: > - None in unicode('abc') > + None in str('abc') > check(0, "None in u'abc' did not raise error") > except TypeError: > pass > > # Test Unicode char in Unicode > > - check(unicode('c') in unicode('abc'), "u'c' not in u'abc'") > - check(unicode('d') not in unicode('abc'), "u'd' in u'abc'") > + check(str('c') in str('abc'), "u'c' not in u'abc'") > + check(str('d') not in str('abc'), "u'd' in u'abc'") The str() calls are unnecessary. > # Test Unicode char in string > > - check(unicode('c') in 'abc', "u'c' not in 'abc'") > - check(unicode('d') not in 'abc', "u'd' in 'abc'") > + check(str('c') in 'abc', "u'c' not in 'abc'") > + check(str('d') not in 'abc', "u'd' in 'abc'") This is testing the same as above. > Modified: python/branches/py3k-struni/Lib/test/test_descr.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_descr.py (original) > +++ python/branches/py3k-struni/Lib/test/test_descr.py Wed May 2 21:09:54 2007 > @@ -264,7 +264,7 @@ > del junk > > # Just make sure these don't blow up! > - for arg in 2, 2, 2j, 2e0, [2], "2", u"2", (2,), {2:2}, type, test_dir: > + for arg in 2, 2, 2j, 2e0, [2], "2", "2", (2,), {2:2}, type, test_dir: This tests "2" twice. > dir(arg) > > # Test dir on custom classes. Since these have object as a > @@ -1100,25 +1100,25 @@ > > # Test unicode slot names > try: > - unicode > + str > except NameError: > pass The try:except: is be unnecessary. > else: > # Test a single unicode string is not expanded as a sequence. > class C(object): > - __slots__ = unicode("abc") > + __slots__ = str("abc") The str() call is unnecessary. > c = C() > c.abc = 5 > vereq(c.abc, 5) > > # _unicode_to_string used to modify slots in certain circumstances > - slots = (unicode("foo"), unicode("bar")) > + slots = (str("foo"), str("bar")) The str() calls are unnecessary. > class C(object): > __slots__ = slots > x = C() > x.foo = 5 > vereq(x.foo, 5) > - veris(type(slots[0]), unicode) > + veris(type(slots[0]), str) > # this used to leak references > try: > class C(object): > @@ -2301,64 +2301,64 @@ > [...] > class sublist(list): > pass > @@ -2437,12 +2437,12 @@ > vereq(int(x=3), 3) > vereq(complex(imag=42, real=666), complex(666, 42)) > vereq(str(object=500), '500') > - vereq(unicode(string='abc', errors='strict'), u'abc') > + vereq(str(string='abc', errors='strict'), 'abc') > vereq(tuple(sequence=range(3)), (0, 1, 2)) > vereq(list(sequence=(0, 1, 2)), range(3)) > # note: as of Python 2.3, dict() no longer has an "items" keyword arg > > - for constructor in (int, float, int, complex, str, unicode, > + for constructor in (int, float, int, complex, str, str, > tuple, list, file): > try: > constructor(bogus_keyword_arg=1) > @@ -2719,13 +2719,13 @@ > class H(object): > __slots__ = ["b", "a"] > try: > - unicode > + str The try:except: is unnecessary. > except NameError: > class I(object): > __slots__ = ["a", "b"] > else: > class I(object): > - __slots__ = [unicode("a"), unicode("b")] > + __slots__ = [str("a"), str("b")] > class J(object): > __slots__ = ["c", "b"] > class K(object): > @@ -3124,9 +3124,9 @@ > > # It's not clear that unicode will continue to support the character > # buffer interface, and this test will fail if that's taken away. > - class MyUni(unicode): > + class MyUni(str): > pass > - base = u'abc' > + base = 'abc' > m = MyUni(base) > vereq(binascii.b2a_hex(m), binascii.b2a_hex(base)) > Modified: python/branches/py3k-struni/Lib/test/test_file.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_file.py (original) > +++ python/branches/py3k-struni/Lib/test/test_file.py Wed May 2 21:09:54 2007 > @@ -145,7 +145,7 @@ > > def testUnicodeOpen(self): > # verify repr works for unicode too > - f = open(unicode(TESTFN), "w") > + f = open(str(TESTFN), "w") > self.assert_(repr(f).startswith(" Modified: python/branches/py3k-struni/Lib/test/test_format.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_format.py (original) > +++ python/branches/py3k-struni/Lib/test/test_format.py Wed May 2 21:09:54 2007 > @@ -35,7 +35,7 @@ > def testboth(formatstr, *args): > testformat(formatstr, *args) > if have_unicode: > - testformat(unicode(formatstr), *args) > + testformat(str(formatstr), *args) This is the same test twice. > Modified: python/branches/py3k-struni/Lib/test/test_iter.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_iter.py (original) > +++ python/branches/py3k-struni/Lib/test/test_iter.py Wed May 2 21:09:54 2007 > @@ -216,9 +216,9 @@ > # Test a Unicode string > if have_unicode: > def test_iter_unicode(self): > - self.check_for_loop(iter(unicode("abcde")), > - [unicode("a"), unicode("b"), unicode("c"), > - unicode("d"), unicode("e")]) > + self.check_for_loop(iter(str("abcde")), > + [str("a"), str("b"), str("c"), > + str("d"), str("e")]) The str() calls are unnecessary. > # Test a directory > def test_iter_dict(self): > @@ -518,7 +518,7 @@ > i = self.i > self.i = i+1 > if i == 2: > - return unicode("fooled you!") > + return str("fooled you!") The str() call is unnecessary. > return next(self.it) > > f = open(TESTFN, "w") > @@ -535,7 +535,7 @@ > # and pass that on to unicode.join(). > try: > got = " - ".join(OhPhooey(f)) > - self.assertEqual(got, unicode("a\n - b\n - fooled you! - c\n")) > + self.assertEqual(got, str("a\n - b\n - fooled you! - c\n")) The str() call is unnecessary. > Modified: python/branches/py3k-struni/Lib/test/test_pep352.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_pep352.py (original) > +++ python/branches/py3k-struni/Lib/test/test_pep352.py Wed May 2 21:09:54 2007 > @@ -90,7 +90,7 @@ > arg = "spam" > exc = Exception(arg) > results = ([len(exc.args), 1], [exc.args[0], arg], [exc.message, arg], > - [str(exc), str(arg)], [unicode(exc), unicode(arg)], > + [str(exc), str(arg)], [str(exc), str(arg)], > [repr(exc), exc.__class__.__name__ + repr(exc.args)]) > self.interface_test_driver(results) > > @@ -101,7 +101,7 @@ > exc = Exception(*args) > results = ([len(exc.args), arg_count], [exc.args, args], > [exc.message, ''], [str(exc), str(args)], > - [unicode(exc), unicode(args)], > + [str(exc), str(args)], > [repr(exc), exc.__class__.__name__ + repr(exc.args)]) > self.interface_test_driver(results) > > @@ -109,7 +109,7 @@ > # Make sure that with no args that interface is correct > exc = Exception() > results = ([len(exc.args), 0], [exc.args, tuple()], [exc.message, ''], > - [str(exc), ''], [unicode(exc), u''], > + [str(exc), ''], [str(exc), ''], > [repr(exc), exc.__class__.__name__ + '()']) > self.interface_test_driver(results) Seems like here the same test is done twice too. > Modified: python/branches/py3k-struni/Lib/test/test_pprint.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_pprint.py (original) > +++ python/branches/py3k-struni/Lib/test/test_pprint.py Wed May 2 21:09:54 2007 > @@ -3,7 +3,7 @@ > import unittest > > try: > - uni = unicode > + uni = str > except NameError: > def uni(x): > return x This can be simplyfied to uni = str (or use str everywhere) > Modified: python/branches/py3k-struni/Lib/test/test_re.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_re.py (original) > +++ python/branches/py3k-struni/Lib/test/test_re.py Wed May 2 21:09:54 2007 > @@ -324,12 +324,12 @@ > [...] > def test_stack_overflow(self): > @@ -561,10 +561,10 @@ > def test_bug_764548(self): > # bug 764548, re.compile() barfs on str/unicode subclasses > try: > - unicode > + str > except NameError: > return # no problem if we have no unicode The try:except: can be removed. > - class my_unicode(unicode): pass > + class my_unicode(str): pass > pat = re.compile(my_unicode("abc")) > self.assertEqual(pat.match("xyz"), None) > > @@ -575,7 +575,7 @@ > > def test_bug_926075(self): > try: > - unicode > + str > except NameError: > return # no problem if we have no unicode > self.assert_(re.compile('bug_926075') is not The try:except: can be removed. > @@ -583,7 +583,7 @@ > > def test_bug_931848(self): > try: > - unicode > + str > except NameError: > pass > pattern = eval('u"[\u002E\u3002\uFF0E\uFF61]"') The try:except: can be removed. > Modified: python/branches/py3k-struni/Lib/test/test_set.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_set.py (original) > +++ python/branches/py3k-struni/Lib/test/test_set.py Wed May 2 21:09:54 2007 > @@ -72,7 +72,7 @@ > self.assertEqual(type(u), self.thetype) > self.assertRaises(PassThru, self.s.union, check_pass_thru()) > self.assertRaises(TypeError, self.s.union, [[]]) > - for C in set, frozenset, dict.fromkeys, str, unicode, list, tuple: > + for C in set, frozenset, dict.fromkeys, str, str, list, tuple: This tests str twice. (This happends several times in test_set.py > self.assertEqual(self.thetype('abcba').union(C('cdc')), set('abcd')) > self.assertEqual(self.thetype('abcba').union(C('efgfe')), set('abcefg')) > self.assertEqual(self.thetype('abcba').union(C('ccb')), set('abc')) > [...] > Modified: python/branches/py3k-struni/Lib/test/test_str.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_str.py (original) > +++ python/branches/py3k-struni/Lib/test/test_str.py Wed May 2 21:09:54 2007 > @@ -31,7 +31,7 @@ > # Make sure __str__() behaves properly > class Foo0: > def __unicode__(self): What happens with __unicode__ after unification? > Modified: python/branches/py3k-struni/Lib/test/test_support.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_support.py (original) > +++ python/branches/py3k-struni/Lib/test/test_support.py Wed May 2 21:09:54 2007 > @@ -131,7 +131,7 @@ > return (x > y) - (x < y) > > try: > - unicode > + str > have_unicode = True > except NameError: > have_unicode = False Can this be dropped? > @@ -151,13 +151,13 @@ > # Assuming sys.getfilesystemencoding()!=sys.getdefaultencoding() > # TESTFN_UNICODE is a filename that can be encoded using the > # file system encoding, but *not* with the default (ascii) encoding > - if isinstance('', unicode): > + if isinstance('', str): > # python -U > # XXX perhaps unicode() should accept Unicode strings? > TESTFN_UNICODE = "@test-\xe0\xf2" > else: > # 2 latin characters. > - TESTFN_UNICODE = unicode("@test-\xe0\xf2", "latin-1") > + TESTFN_UNICODE = str("@test-\xe0\xf2", "latin-1") > TESTFN_ENCODING = sys.getfilesystemencoding() > # TESTFN_UNICODE_UNENCODEABLE is a filename that should *not* be > # able to be encoded by *either* the default or filesystem encoding. > > Modified: python/branches/py3k-struni/Lib/test/test_unicode.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_unicode.py (original) > +++ python/branches/py3k-struni/Lib/test/test_unicode.py Wed May 2 21:09:54 2007 This should probably be dropped/merged into test_str. > Modified: python/branches/py3k-struni/Lib/test/test_xmlrpc.py > ============================================================================== > --- python/branches/py3k-struni/Lib/test/test_xmlrpc.py (original) > +++ python/branches/py3k-struni/Lib/test/test_xmlrpc.py Wed May 2 21:09:54 2007 > @@ -5,7 +5,7 @@ > from test import test_support > > try: > - unicode > + str > except NameError: > have_unicode = False The try:except: can be dropped. > Modified: python/branches/py3k-struni/Lib/textwrap.py > ============================================================================== > --- python/branches/py3k-struni/Lib/textwrap.py (original) > +++ python/branches/py3k-struni/Lib/textwrap.py Wed May 2 21:09:54 2007 > @@ -70,7 +70,7 @@ > whitespace_trans = string.maketrans(_whitespace, ' ' * len(_whitespace)) > > unicode_whitespace_trans = {} > - uspace = ord(u' ') > + uspace = ord(' ') > for x in map(ord, _whitespace): > unicode_whitespace_trans[x] = uspace > > @@ -127,7 +127,7 @@ > if self.replace_whitespace: > if isinstance(text, str): > text = text.translate(self.whitespace_trans) > - elif isinstance(text, unicode): > + elif isinstance(text, str): This checks for str twice. > Modified: python/branches/py3k-struni/Lib/types.py > ============================================================================== > --- python/branches/py3k-struni/Lib/types.py (original) > +++ python/branches/py3k-struni/Lib/types.py Wed May 2 21:09:54 2007 > @@ -28,7 +28,7 @@ > # types.StringTypes", you should use "isinstance(x, basestring)". But > # we keep around for compatibility with Python 2.2. > try: > - UnicodeType = unicode > + UnicodeType = str > StringTypes = (StringType, UnicodeType) > except NameError: > StringTypes = (StringType,) Can we drop this? > Modified: python/branches/py3k-struni/Lib/urllib.py > ============================================================================== > --- python/branches/py3k-struni/Lib/urllib.py (original) > +++ python/branches/py3k-struni/Lib/urllib.py Wed May 2 21:09:54 2007 > @@ -984,13 +984,13 @@ > # quote('abc def') -> 'abc%20def') > > try: > - unicode > + str > except NameError: > def _is_unicode(x): > return 0 > else: > def _is_unicode(x): > - return isinstance(x, unicode) > + return isinstance(x, str) Can _is_unicode simply return True? > Modified: python/branches/py3k-struni/Lib/xml/dom/minicompat.py > ============================================================================== > --- python/branches/py3k-struni/Lib/xml/dom/minicompat.py (original) > +++ python/branches/py3k-struni/Lib/xml/dom/minicompat.py Wed May 2 21:09:54 2007 > @@ -41,11 +41,11 @@ > import xml.dom > > try: > - unicode > + str > except NameError: > StringTypes = type(''), > else: > - StringTypes = type(''), type(unicode('')) > + StringTypes = type(''), type(str('')) This ammounts to StringTypes = str > class NodeList(list): > > Modified: python/branches/py3k-struni/Lib/xmlrpclib.py > ============================================================================== > --- python/branches/py3k-struni/Lib/xmlrpclib.py (original) > +++ python/branches/py3k-struni/Lib/xmlrpclib.py Wed May 2 21:09:54 2007 > @@ -144,9 +144,9 @@ > # Internal stuff > > try: > - unicode > + str > except NameError: > - unicode = None # unicode support not available > + str = None # unicode support not available The try:except: can be dropped and all subsequent "if str:" tests too. From percivall at gmail.com Thu May 3 12:26:56 2007 From: percivall at gmail.com (Simon Percivall) Date: Thu, 3 May 2007 12:26:56 +0200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: On 2 maj 2007, at 20.08, Guido van Rossum wrote: > [Georg] >>>>>>> a, *b, c = range(5) >>>>>>> a >>>> 0 >>>>>>> c >>>> 4 >>>>>>> b >>>> [1, 2, 3] > > > That sounds messy; only allowing *a at the end seems a bit more > manageable. But I'll hold off until I can shoot holes in your > implementation. ;-) As the patch works right now, any iterator will be exhausted, but if the proposal is constrained to only allowing the *name at the end, wouldn't a more useful behavior be to not exhaust the iterator, making it similar to: > it = iter(range(10)) > a = next(it) > b = it or would this be too surprising? //Simon From g.brandl at gmx.net Thu May 3 13:46:10 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 03 May 2007 13:46:10 +0200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: Simon Percivall schrieb: > On 2 maj 2007, at 20.08, Guido van Rossum wrote: >> [Georg] >>>>>>>> a, *b, c = range(5) >>>>>>>> a >>>>> 0 >>>>>>>> c >>>>> 4 >>>>>>>> b >>>>> [1, 2, 3] >> >> >> That sounds messy; only allowing *a at the end seems a bit more >> manageable. But I'll hold off until I can shoot holes in your >> implementation. ;-) > > As the patch works right now, any iterator will be exhausted, > but if the proposal is constrained to only allowing the *name at > the end, wouldn't a more useful behavior be to not exhaust the > iterator, making it similar to: > > > it = iter(range(10)) > > a = next(it) > > b = it > > or would this be too surprising? IMO yes. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From skip at pobox.com Thu May 3 12:35:09 2007 From: skip at pobox.com (skip at pobox.com) Date: Thu, 3 May 2007 05:35:09 -0500 Subject: [Python-3000] [Python-Dev] Implicit String Concatenation and Octal Literals Was: PEP 30XZ: Simplified Parsing In-Reply-To: <000401c78d4c$796bfe60$f301a8c0@RaymondLaptop1> References: <20070502210339.BHU28881@ms09.lnh.mail.rcn.net> <17977.16058.847429.905398@montanaro.dyndns.org> <000401c78d4c$796bfe60$f301a8c0@RaymondLaptop1> Message-ID: <17977.47837.397664.190390@montanaro.dyndns.org> Raymond> Another way to look at it is to ask whether we would consider Raymond> adding implicit string concatenation if we didn't already have Raymond> it. As I recall it was a "relatively recent" addition. Maybe 2.0 or 2.1? It certainly hasn't been there from the beginning. Skip From benji at benjiyork.com Thu May 3 15:01:54 2007 From: benji at benjiyork.com (Benji York) Date: Thu, 03 May 2007 09:01:54 -0400 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <46397BB2.4060404@ronadam.com> References: <02b401c78d15$f04b6110$090a0a0a@enfoldsystems.local> <46397BB2.4060404@ronadam.com> Message-ID: <4639DD42.3020307@benjiyork.com> Ron Adam wrote: > The following inconsistency still bothers me, but I suppose it's an edge > case that doesn't cause problems. > > >>> print r"hello world\" > File "", line 1 > print r"hello world\" > ^ > SyntaxError: EOL while scanning single-quoted string > In the first case, it's treated as a continuation character even though > it's not at the end of a physical line. So it gives an error. No, that is unrelated to line continuation. The \" is an escape sequence, therefore there is no double-quote to end the string literal. -- Benji York http://benjiyork.com From g.brandl at gmx.net Thu May 3 15:50:07 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 03 May 2007 15:50:07 +0200 Subject: [Python-3000] Escaping in raw strings (was Re: [Python-Dev] PEP 30XZ: Simplified Parsing) Message-ID: Benji York schrieb: > Ron Adam wrote: >> The following inconsistency still bothers me, but I suppose it's an edge >> case that doesn't cause problems. >> >> >>> print r"hello world\" >> File "", line 1 >> print r"hello world\" >> ^ >> SyntaxError: EOL while scanning single-quoted string > >> In the first case, it's treated as a continuation character even though >> it's not at the end of a physical line. So it gives an error. > > No, that is unrelated to line continuation. The \" is an escape > sequence, therefore there is no double-quote to end the string literal. But IMHO this is really something that can and ought to be fixed. I would let a raw string end at the first matching quote and not have any escaping available. That's no loss of functionality since there is no way to put a single " into a r"" string today. You can do r"\"", but it doesn't have the effect of just escaping the closing quote, so it's pretty useless. Is that something that can be agreed upon without a PEP? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From rrr at ronadam.com Thu May 3 15:55:13 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 03 May 2007 08:55:13 -0500 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <4639DD42.3020307@benjiyork.com> References: <02b401c78d15$f04b6110$090a0a0a@enfoldsystems.local> <46397BB2.4060404@ronadam.com> <4639DD42.3020307@benjiyork.com> Message-ID: <4639E9C1.4010109@ronadam.com> Benji York wrote: > Ron Adam wrote: >> The following inconsistency still bothers me, but I suppose it's an edge >> case that doesn't cause problems. >> >> >>> print r"hello world\" >> File "", line 1 >> print r"hello world\" >> ^ >> SyntaxError: EOL while scanning single-quoted string > >> In the first case, it's treated as a continuation character even though >> it's not at the end of a physical line. So it gives an error. > > No, that is unrelated to line continuation. The \" is an escape > sequence, therefore there is no double-quote to end the string literal. Are you sure? >>> print r'\"' \" It's just a '\' here. These are raw strings if you didn't notice. Cheers, Ron From fdrake at acm.org Thu May 3 15:58:53 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 3 May 2007 09:58:53 -0400 Subject: [Python-3000] Escaping in raw strings (was Re: [Python-Dev] PEP 30XZ: Simplified Parsing) In-Reply-To: References: Message-ID: <200705030958.53558.fdrake@acm.org> On Thursday 03 May 2007, Georg Brandl wrote: > Is that something that can be agreed upon without a PEP? I expect this to be at least somewhat controversial, so a PEP is warranted. I'd like to see it fixed, though. -Fred -- Fred L. Drake, Jr. From rrr at ronadam.com Thu May 3 15:55:13 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 03 May 2007 08:55:13 -0500 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <4639DD42.3020307@benjiyork.com> References: <02b401c78d15$f04b6110$090a0a0a@enfoldsystems.local> <46397BB2.4060404@ronadam.com> <4639DD42.3020307@benjiyork.com> Message-ID: <4639E9C1.4010109@ronadam.com> Benji York wrote: > Ron Adam wrote: >> The following inconsistency still bothers me, but I suppose it's an edge >> case that doesn't cause problems. >> >> >>> print r"hello world\" >> File "", line 1 >> print r"hello world\" >> ^ >> SyntaxError: EOL while scanning single-quoted string > >> In the first case, it's treated as a continuation character even though >> it's not at the end of a physical line. So it gives an error. > > No, that is unrelated to line continuation. The \" is an escape > sequence, therefore there is no double-quote to end the string literal. Are you sure? >>> print r'\"' \" It's just a '\' here. These are raw strings if you didn't notice. Cheers, Ron From skip at pobox.com Thu May 3 15:11:01 2007 From: skip at pobox.com (skip at pobox.com) Date: Thu, 3 May 2007 08:11:01 -0500 Subject: [Python-3000] [Python-Dev] Implicit String Concatenation and Octal Literals Was: PEP 30XZ: Simplified Parsing In-Reply-To: <17977.47837.397664.190390@montanaro.dyndns.org> References: <20070502210339.BHU28881@ms09.lnh.mail.rcn.net> <17977.16058.847429.905398@montanaro.dyndns.org> <000401c78d4c$796bfe60$f301a8c0@RaymondLaptop1> <17977.47837.397664.190390@montanaro.dyndns.org> Message-ID: <17977.57189.849175.981712@montanaro.dyndns.org> >>>>> "skip" == skip writes: Raymond> Another way to look at it is to ask whether we would consider Raymond> adding implicit string concatenation if we didn't already have Raymond> it. skip> As I recall it was a "relatively recent" addition. Maybe 2.0 or skip> 2.1? It certainly hasn't been there from the beginning. Misc/HISTORY suggests this feature was added in 1.0.2 (May 1994). Apologies for my bad memory. Skip From jimjjewett at gmail.com Thu May 3 16:16:53 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 3 May 2007 10:16:53 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <463980EB.1070102@canterbury.ac.nz> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> <5.1.1.6.0.20070502113122.04e41350@sparrow.telecommunity.com> <5.1.1.6.0.20070502211440.02a386d8@sparrow.telecommunity.com> <463980EB.1070102@canterbury.ac.nz> Message-ID: On 5/3/07, Greg Ewing wrote: > I don't doubt that things like @before and @after are > handy. But being handy isn't enough for something to > get into the Python core. I hadn't thought of @before and @after as truly core; I had assumed they were decorators that would be available in a genfunc module. I'll agree that the actual timing of the super-call is often not essential, and having time-words confuses that. On the other hand, they do give you (1) The function being added as an overload doesn't have to know anything about the framework, or even that another method may ever be called at all; so long as the super-call is at one end, the registration function can take care of this. (2) The explicit version of next_method corresponds to super, but is uglier in practice, becaues their isn't inheritance involved. My strawman would boil down to... def foo():... next_method = GenFunc.dispatch(*args, after=__this_function__) Note that the overriding function foo would need to have both a reference to itself (as opposed to its name, which will often be bound to somthing else) and to the generic function from which it is being called (and it might be called from several such functions). Arranging this during the registration seems like an awaful lots of work to avoid @after -jJ From turnbull at sk.tsukuba.ac.jp Thu May 3 16:40:03 2007 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 03 May 2007 23:40:03 +0900 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <179D5383-88F0-4246-B355-5A817B9F7EBE@python.org> References: <4638B151.6020901@voidspace.org.uk> <5.1.1.6.0.20070502144742.02bc1908@sparrow.telecommunity.com> <179D5383-88F0-4246-B355-5A817B9F7EBE@python.org> Message-ID: <87hcquezss.fsf@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > The problem is that > > _("some string" > " and more of it") > > is not the same as > > _("some string" + > " and more of it") Are you worried about translators? The gettext functions themselves will just see the result of the operation. The extraction tools like xgettext do fail, however. Translating the above to # The problem is that gettext("some string" " and more of it") # is not the same as gettext("some string" + " and more of it") and invoking "xgettext --force-po --language=Python test.py" gives # SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER # This file is distributed under the same license as the PACKAGE package. # FIRST AUTHOR , YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2007-05-03 23:32+0900\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: 8bit\n" #: test.py:3 msgid "some string and more of it" msgstr "" #: test.py:8 msgid "some string" msgstr "" BTW, it doesn't work for the C equivalent, either. > You would either have to teach pygettext and maybe gettext about > this construct, or you'd have to use something different. Teaching Python-based extraction tools about it isn't hard, just make sure that you slurp in the whole argument, and eval it. If what you get isn't a string, throw an exception. xgettext will be harder, since apparently does not do it, nor does it even know enough to error or warn on syntax it doesn't handle within gettext()'s argument. From barry at python.org Thu May 3 17:34:58 2007 From: barry at python.org (Barry Warsaw) Date: Thu, 3 May 2007 11:34:58 -0400 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <87hcquezss.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4638B151.6020901@voidspace.org.uk> <5.1.1.6.0.20070502144742.02bc1908@sparrow.telecommunity.com> <179D5383-88F0-4246-B355-5A817B9F7EBE@python.org> <87hcquezss.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <2CF5A0DA-509D-4A3D-96A6-30D601572E3E@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 3, 2007, at 10:40 AM, Stephen J. Turnbull wrote: > Barry Warsaw writes: > >> The problem is that >> >> _("some string" >> " and more of it") >> >> is not the same as >> >> _("some string" + >> " and more of it") > > Are you worried about translators? The gettext functions themselves > will just see the result of the operation. The extraction tools like > xgettext do fail, however. Yep, sorry, it is the extraction tools I'm worried about. > Teaching Python-based extraction tools about it isn't hard, just make > sure that you slurp in the whole argument, and eval it. If what you > get isn't a string, throw an exception. xgettext will be harder, > since apparently does not do it, nor does it even know enough to error > or warn on syntax it doesn't handle within gettext()'s argument. IMO, this is a problem. We can make the Python extraction tool work, but we should still be very careful about breaking 3rd party tools like xgettext, since other projects may be using such tools. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjoBI3EjvBPtnXfVAQLg0AP/Y1ncqie1NgzRFzuZpnZapMs/+oo+5BCK 1MYqsJwucnDJnOqrUcU34Vq3SB7X7VsSDv3TuoTNnheinX6senorIFQKRAj4abKT f2Y63t6BT97mSOAITFZvVSj0YSG+zkD/HMGeDj4dOJFLj1tYxgKpVprlhMbELzG1 AIKe+wsYjcs= =+oFV -----END PGP SIGNATURE----- From pje at telecommunity.com Thu May 3 18:05:32 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 03 May 2007 12:05:32 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <5.1.1.6.0.20070501123026.0524eac8@sparrow.telecommunity.com> <5.1.1.6.0.20070501235655.02a57fd8@sparrow.telecommunity.com> <5.1.1.6.0.20070502113122.04e41350@sparrow.telecommunity.com> <5.1.1.6.0.20070502211440.02a386d8@sparrow.telecommunity.com> <463980EB.1070102@canterbury.ac.nz> Message-ID: <20070503160351.1FF833A4070@sparrow.telecommunity.com> At 10:16 AM 5/3/2007 -0400, Jim Jewett wrote: >On 5/3/07, Greg Ewing wrote: > > > I don't doubt that things like @before and @after are > > handy. But being handy isn't enough for something to > > get into the Python core. > >I hadn't thought of @before and @after as truly core; I had assumed >they were decorators that would be available in a genfunc module. Everything in the PEP is imported from an "overloading" module. I'm not crazy enough to try proposing any built-ins at this point. >(2) The explicit version of next_method corresponds to super, but is >uglier in practice, becaues their isn't inheritance involved. My >strawman would boil down to... > > def foo():... > next_method = GenFunc.dispatch(*args, after=__this_function__) Keep in mind that the same function can be re-registered under multiple rules, so a reference to the function is insufficient to specify where to chain from. Also, your proposal appears to be *re-dispatching* the arguments. My implementation doesn't redispatch anything; it creates a chain of method objects, which each know their next method. These chains are created and cached whenever a new combination of methods is required. In RuleDispatch, the chains are actually linked as bound method objects, so that a function's next_method is bound as if it were the "self" of that function. Thus, calling the next method takes advantage of Python's "bound method" optimizations. >Note that the overriding function foo would need to have both a >reference to itself (as opposed to its name, which will often be bound >to somthing else) and to the generic function from which it is being >called (and it might be called from several such functions). >Arranging this during the registration seems like an awaful lots of >work to avoid @after Yep, it's a whole lot simpler just to provide the next_method as an extra argument. From steven.bethard at gmail.com Thu May 3 18:08:54 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Thu, 3 May 2007 10:08:54 -0600 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: On 5/3/07, Simon Percivall wrote: > On 2 maj 2007, at 20.08, Guido van Rossum wrote: > > [Georg] > >>>>>>> a, *b, c = range(5) > >>>>>>> a > >>>> 0 > >>>>>>> c > >>>> 4 > >>>>>>> b > >>>> [1, 2, 3] > > > > > > That sounds messy; only allowing *a at the end seems a bit more > > manageable. But I'll hold off until I can shoot holes in your > > implementation. ;-) > > As the patch works right now, any iterator will be exhausted, > but if the proposal is constrained to only allowing the *name at > the end, wouldn't a more useful behavior be to not exhaust the > iterator, making it similar to: > > > it = iter(range(10)) > > a = next(it) > > b = it > > or would this be too surprising? In argument lists, *args exhausts iterators, converting them to tuples. I think it would be confusing if *args in tuple unpacking didn't do the same thing. This brings up the question of why the patch produces lists, not tuples. What's the reasoning behind that? STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From g.brandl at gmx.net Thu May 3 18:12:44 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 03 May 2007 18:12:44 +0200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: Steven Bethard schrieb: > On 5/3/07, Simon Percivall wrote: >> On 2 maj 2007, at 20.08, Guido van Rossum wrote: >> > [Georg] >> >>>>>>> a, *b, c = range(5) >> >>>>>>> a >> >>>> 0 >> >>>>>>> c >> >>>> 4 >> >>>>>>> b >> >>>> [1, 2, 3] >> > >> > >> > That sounds messy; only allowing *a at the end seems a bit more >> > manageable. But I'll hold off until I can shoot holes in your >> > implementation. ;-) >> >> As the patch works right now, any iterator will be exhausted, >> but if the proposal is constrained to only allowing the *name at >> the end, wouldn't a more useful behavior be to not exhaust the >> iterator, making it similar to: >> >> > it = iter(range(10)) >> > a = next(it) >> > b = it >> >> or would this be too surprising? > > In argument lists, *args exhausts iterators, converting them to > tuples. I think it would be confusing if *args in tuple unpacking > didn't do the same thing. > > This brings up the question of why the patch produces lists, not > tuples. What's the reasoning behind that? IMO, it's likely that you would like to further process the resulting sequence, including modifying it. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From mark.m.mcmahon at gmail.com Thu May 3 17:59:42 2007 From: mark.m.mcmahon at gmail.com (Mark Mc Mahon) Date: Thu, 3 May 2007 11:59:42 -0400 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers Message-ID: <71b6302c0705030859x1aec71cena2ae950255043fd1@mail.gmail.com> Hi, One item that I haven't seen mentioned in support of this is that there is code that uses getattr for accessing things that might be access other ways. For example the Attribute access Dictionaries (http://mail.python.org/pipermail/python-list/2007-March/429137.html), if one of the keys has a non ASCII character then will not be accessible through attribute access. (you could say the same for punctuation - but I think they are not the same thing). In pywinauto I try to let people use attribute access for accessing dialogs and controls of Windows applications e.g. your_app.DialogTitle.ControlCaption.Click() This works great for English - but for other languages people have to use item access your_app.[u'DialogTitle'].[u'ControlCaption'].Click() Anyway, just wanted to raise that option too for consideration. Thanks for the wonderful langauge, Mark From steven.bethard at gmail.com Thu May 3 18:24:46 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Thu, 3 May 2007 10:24:46 -0600 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: On 5/3/07, Georg Brandl wrote: > Steven Bethard schrieb: > > On 5/3/07, Simon Percivall wrote: > >> On 2 maj 2007, at 20.08, Guido van Rossum wrote: > >> > [Georg] > >> >>>>>>> a, *b, c = range(5) > >> >>>>>>> a > >> >>>> 0 > >> >>>>>>> c > >> >>>> 4 > >> >>>>>>> b > >> >>>> [1, 2, 3] [snip] > > In argument lists, *args exhausts iterators, converting them to > > tuples. I think it would be confusing if *args in tuple unpacking > > didn't do the same thing. > > > > This brings up the question of why the patch produces lists, not > > tuples. What's the reasoning behind that? > > IMO, it's likely that you would like to further process the resulting > sequence, including modifying it. Well if that's what you're aiming at, then I'd expect it to be more useful to have the unpacking generate not lists, but the same type you started with, e.g. if I started with a string, I probably want to continue using strings:: >>> first, *rest = 'abcdef' >>> assert first == 'a', rest == 'bcdef' By that same logic, if I started with iterators, I probably want to continue using iterators, e.g.:: >>> f = open(...) >>> first_line, *remaining_lines = f So I guess it seems pretty arbitrary to me to assume that a list is what people want to be using. And if we're going to be arbitrary, I don't see why we shouldn't be arbitrary in the same way as function arguments so that we only need on explanation. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From jimjjewett at gmail.com Thu May 3 18:44:18 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 3 May 2007 12:44:18 -0400 Subject: [Python-3000] PEP 3120 (Was: PEP Parade) In-Reply-To: <46398CE8.2060206@v.loewis.de> References: <46398CE8.2060206@v.loewis.de> Message-ID: On 5/3/07, "Martin v. L?wis" wrote: > Untangling the parser from stdio - sure. I also think it would > be desirable to read the whole source into a buffer, rather than > applying a line-by-line input. That might be a bigger change, > making the tokenizer a multi-stage algorithm: > 1. read input into a buffer > 2. determine source encoding (looking at a BOM, else a > declaration within the first two lines, else default > to UTF-8) > 3. if the source encoding is not UTF-8, pass it through > a codec (decode to string, encode to UTF-8). Otherwise, > check that all bytes are really well-formed UTF-8. > 4. start parsing So people could hook into their own "codec" that, say, replaced native language keywords with standard python keywords? Part of me says that should be an import hook instead of pretending to be a codec... -jJ From jseutter at gmail.com Thu May 3 19:17:27 2007 From: jseutter at gmail.com (Jerry Seutter) Date: Thu, 3 May 2007 11:17:27 -0600 Subject: [Python-3000] PEP-3125 -- remove backslash continuation In-Reply-To: <003601c78cbf$0cacec90$2606c5b0$@org> References: <003601c78cbf$0cacec90$2606c5b0$@org> Message-ID: <2c8d48d70705031017l10254449q509f7e4fd06c0442@mail.gmail.com> On 5/2/07, Andrew Koenig wrote: > > Looking at PEP-3125, I see that one of the rejected alternatives is to > allow > any unfinished expression to indicate a line continuation. > > I would like to suggest a modification to that alternative that has worked > successfully in another programming language, namely Stu Feldman's > EFL. EFL > is a language intended for numerical programming; it compiles into Fortran > with the interesting property that the resulting Fortran code is intended > to > be human-readable and maintainable by people who do not happen to have > access to the EFL compiler. > > Anyway, the (only) continuation rule in EFL is that if the last token in a > line is one that lexically cannot be the last token in a statement, then > the > next line is considered a continuation of the current line. > > Python currently has a rule that if parentheses are unbalanced, a newline > does not end the statement. If we were to translate the EFL rule to > Python, > it would be something like this: > > The whitespace that follows an operator or open bracket or > parenthesis > can include newline characters. > > Note that if this suggestion were implemented, it would presumably be at a > very low lexical level--even before the decision is made to turn a newline > followed by spaces into an INDENT or DEDENT token. I think that this > property solves the difficulty-of-parsing problem. Indeed, I think that > this suggestion would be easier to implement than the current > unbalanced-parentheses rule. > > Would this change alter where errors are reported by the parser? Is my x = x + # Oops. ... some other code ... going to have an error reported 15 lines below where the actual typo was made? Jerry -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070503/2a3579bb/attachment.htm From barry at python.org Thu May 3 19:52:11 2007 From: barry at python.org (Barry Warsaw) Date: Thu, 3 May 2007 13:52:11 -0400 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <878xc5g8qj.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4638B151.6020901@voidspace.org.uk> <5.1.1.6.0.20070502144742.02bc1908@sparrow.telecommunity.com> <179D5383-88F0-4246-B355-5A817B9F7EBE@python.org> <87hcquezss.fsf@uwakimon.sk.tsukuba.ac.jp> <2CF5A0DA-509D-4A3D-96A6-30D601572E3E@python.org> <878xc5g8qj.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1C94BBE1-F569-4F59-85E0-B585B9D21D1A@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On May 3, 2007, at 12:41 PM, Stephen J. Turnbull wrote: > Barry Warsaw writes: > >> IMO, this is a problem. We can make the Python extraction tool work, >> but we should still be very careful about breaking 3rd party tools >> like xgettext, since other projects may be using such tools. > > But > > _("some string" + > " and more of it") > > is already legal Python, and xgettext is already broken for it. Yep, but the idiom that *gettext accepts is used far more often. If that's outlawed then the tools /have/ to be taught the alternative. > Arguably, xgettext's implementation of -L Python should be > > execve ("pygettext", argv, environ); > > Ouch. :) - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRjohUXEjvBPtnXfVAQLHhAQAmKNyjbPpIMIlz7zObvb09wdw7jyC2bBa 2w+rDilRgxicUXWqH/L6AeHHl3HiVOO+tELU6upTxOWBMlJG8xcY70rde/32I0gb Wm0ylLlvDU/bAlSMyUscs77BVt82UQsBEqXyQ2+PRfQj7aOkpqgT8P3dwCYrtPaH L4W4JzvoK1M= =9pgu -----END PGP SIGNATURE----- From guido at python.org Thu May 3 20:30:19 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 3 May 2007 11:30:19 -0700 Subject: [Python-3000] Escaping in raw strings (was Re: [Python-Dev] PEP 30XZ: Simplified Parsing) In-Reply-To: <200705030958.53558.fdrake@acm.org> References: <200705030958.53558.fdrake@acm.org> Message-ID: On 5/3/07, Fred L. Drake, Jr. wrote: > On Thursday 03 May 2007, Georg Brandl wrote: > > Is that something that can be agreed upon without a PEP? > > I expect this to be at least somewhat controversial, so a PEP is warranted. > I'd like to see it fixed, though. It's too late for a new PEP. It certainly is controversial; how would you write a regexp that matches a single or double quote using r"..." or r'...'? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu May 3 20:35:50 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 3 May 2007 11:35:50 -0700 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: References: <02b401c78d15$f04b6110$090a0a0a@enfoldsystems.local> <46397BB2.4060404@ronadam.com> <4639DD42.3020307@benjiyork.com> <4639E9C1.4010109@ronadam.com> Message-ID: On 5/3/07, Georg Brandl wrote: > > These are raw strings if you didn't notice. > > It's all in the implementation. The tokenizer takes it as an escape sequence > -- it doesn't specialcase raw strings -- the AST builder (parsestr() in ast.c) > doesn't. FWIW, it wasn't designed this way so as to be easy to implement. It was designed this way because the overwhelming use case is regular expressions, where one needs to be able to escape single and double quotes -- the re module unescapes \" and \' when it encounters them. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Thu May 3 20:40:18 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 03 May 2007 20:40:18 +0200 Subject: [Python-3000] Escaping in raw strings (was Re: [Python-Dev] PEP 30XZ: Simplified Parsing) In-Reply-To: References: <200705030958.53558.fdrake@acm.org> Message-ID: Guido van Rossum schrieb: > On 5/3/07, Fred L. Drake, Jr. wrote: >> On Thursday 03 May 2007, Georg Brandl wrote: >> > Is that something that can be agreed upon without a PEP? >> >> I expect this to be at least somewhat controversial, so a PEP is warranted. >> I'd like to see it fixed, though. > > It's too late for a new PEP. It wouldn't be too late for a 2.6 PEP, would it? However, I'm not going to champion this. > It certainly is controversial; how would you write a regexp that > matches a single or double quote using r"..." or r'...'? You'd have to concatenate two string literals... Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From martin at v.loewis.de Thu May 3 23:09:16 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 03 May 2007 23:09:16 +0200 Subject: [Python-3000] PEP 3120 (Was: PEP Parade) In-Reply-To: References: <46398CE8.2060206@v.loewis.de> Message-ID: <463A4F7C.8090406@v.loewis.de> >> 1. read input into a buffer >> 2. determine source encoding (looking at a BOM, else a >> declaration within the first two lines, else default >> to UTF-8) >> 3. if the source encoding is not UTF-8, pass it through >> a codec (decode to string, encode to UTF-8). Otherwise, >> check that all bytes are really well-formed UTF-8. >> 4. start parsing > > So people could hook into their own "codec" that, say, replaced native > language keywords with standard python keywords? No, so that PEP 263 remains implemented. Martin From greg.ewing at canterbury.ac.nz Fri May 4 06:08:59 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 04 May 2007 16:08:59 +1200 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> Message-ID: <463AB1DB.5010308@canterbury.ac.nz> Giovanni Bajo wrote: > On 01/05/2007 18.09, Phillip J. Eby wrote: > > That means that if 'self' in your example above is collected, then > > the weakref no longer exists, so the closedown won't be called. > > Yes, but as far as I understand it, the GC does special care to ensure that > the callback of a weakref that is *not* part of a cyclic trash being collected > is always called. It has nothing to do with cyclic GC. The point is that if the refcount of a weak reference drops to zero before that of the object being weakly referenced, the weak reference object itself is deallocated and its callback is *not* called. So having the resource-using object hold the weak ref to the resource doesn't work -- it has to be kept in some kind of separate registry. -- Greg From greg.ewing at canterbury.ac.nz Fri May 4 06:13:21 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 04 May 2007 16:13:21 +1200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: <463AB2E1.2030408@canterbury.ac.nz> Simon Percivall wrote: > if the proposal is constrained to only allowing the *name at > the end, wouldn't a more useful behavior be to not exhaust the > iterator, making it similar to: > > > it = iter(range(10)) > > a = next(it) > > b = it > > or would this be too surprising? It would surprise the heck out of me when I started with something that wasn't an iterator and ended up with b being something that I could only iterate and couldn't index. -- Greg From greg.ewing at canterbury.ac.nz Fri May 4 06:26:24 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 04 May 2007 16:26:24 +1200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: <463AB5F0.7020407@canterbury.ac.nz> Steven Bethard wrote: > This brings up the question of why the patch produces lists, not > tuples. What's the reasoning behind that? When dealing with an iterator, you don't know the length in advance, so the only way to get a tuple would be to produce a list first and then create a tuple from it. -- Greg From guido at python.org Fri May 4 06:34:58 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 3 May 2007 21:34:58 -0700 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <463AB1DB.5010308@canterbury.ac.nz> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> Message-ID: In all the threads about this PEP I still haven't seen a single example of how to write a finalizer. Let's take a specific example of a file object (this occurs in io.py in the p3yk branch). When a write buffer is GC'ed it must be flushed. The current way of writing this is simple: class BufferedWriter: def __init__(self, raw): self.raw = raw self.buffer = b"" def write(self, data): self.buffer += data if len(self.buffer) >= 8192: self.flush() def flush(self): self.raw.write(self.buffer) self.buffer = b"" def __del__(self): self.flush() How would I write this without using __del__(), e.g. using weak references? P.S. Don't bother arguing that the caller should use try/finally or whatever. That's not the point. Assuming we have a class like this where it has been decided that some method must be called upon destruction, how do we arrange for that call to happen? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From python at rcn.com Fri May 4 07:12:19 2007 From: python at rcn.com (Raymond Hettinger) Date: Thu, 3 May 2007 22:12:19 -0700 Subject: [Python-3000] PEP: Eliminate __del__ References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1><5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> Message-ID: <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> From: "Greg Ewing" > It has nothing to do with cyclic GC. The point is that > if the refcount of a weak reference drops to zero before > that of the object being weakly referenced, the weak > reference object itself is deallocated and its callback > is *not* called. So having the resource-using object > hold the weak ref to the resource doesn't work -- it > has to be kept in some kind of separate registry. I'll write-up an idiomaticc approach an include it in PEP this weekend. Raymond From turnbull at sk.tsukuba.ac.jp Thu May 3 18:41:40 2007 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 04 May 2007 01:41:40 +0900 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <2CF5A0DA-509D-4A3D-96A6-30D601572E3E@python.org> References: <4638B151.6020901@voidspace.org.uk> <5.1.1.6.0.20070502144742.02bc1908@sparrow.telecommunity.com> <179D5383-88F0-4246-B355-5A817B9F7EBE@python.org> <87hcquezss.fsf@uwakimon.sk.tsukuba.ac.jp> <2CF5A0DA-509D-4A3D-96A6-30D601572E3E@python.org> Message-ID: <878xc5g8qj.fsf@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > IMO, this is a problem. We can make the Python extraction tool work, > but we should still be very careful about breaking 3rd party tools > like xgettext, since other projects may be using such tools. But _("some string" + " and more of it") is already legal Python, and xgettext is already broken for it. Arguably, xgettext's implementation of -L Python should be execve ("pygettext", argv, environ); From ms at cerenity.org Thu May 3 18:06:58 2007 From: ms at cerenity.org (Michael Sparks) Date: Thu, 3 May 2007 17:06:58 +0100 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <87hcquezss.fsf@uwakimon.sk.tsukuba.ac.jp> References: <179D5383-88F0-4246-B355-5A817B9F7EBE@python.org> <87hcquezss.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <200705031706.59685.ms@cerenity.org> On Thursday 03 May 2007 15:40, Stephen J. Turnbull wrote: > Teaching Python-based extraction tools about it isn't hard, just make > sure that you slurp in the whole argument, and eval it. We generate our component documentation based on going through the AST generated by compiler.ast, finding doc strings (and other strings in other known/expected locations), and then formatting using docutils. Eval'ing the file isn't always going to work due to imports relying on libraries that may need to be installed. (This is especially the case with Kamaelia because we tend to wrap libraries for usage as components in a convenient way) We've also specifically moved away from importing the file or eval'ing things because of this issue. It makes it easier to have docs built on a random machine with not too much installed on it. You could special case "12345" + "67890" as a compile timeconstructor and jiggle things such that by the time it came out the parser that looked like "1234567890", but I don't see what that has to gain over the current form. (which doesn't look like an expression) I also think that's a rather nasty version. On the flip side if we're eval'ing an expression to get a docstring, there would be great temptation to extend that to be a doc-object - eg using dictionaries, etc as well for more specific docs. Is that wise? I don't know :) Michael. -- Kamaelia project lead http://kamaelia.sourceforge.net/Home From stephen at xemacs.org Thu May 3 19:54:54 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 04 May 2007 02:54:54 +0900 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <200705031706.59685.ms@cerenity.org> References: <179D5383-88F0-4246-B355-5A817B9F7EBE@python.org> <87hcquezss.fsf@uwakimon.sk.tsukuba.ac.jp> <200705031706.59685.ms@cerenity.org> Message-ID: <87y7k5eqs1.fsf@uwakimon.sk.tsukuba.ac.jp> Michael Sparks writes: > We generate our component documentation based on going through the AST > generated by compiler.ast, finding doc strings (and other strings in > other known/expected locations), and then formatting using docutils. Are you talking about I18N and gettext? If so, I'm really lost .... > You could special case "12345" + "67890" as a compile timeconstructor and > jiggle things such that by the time it came out the parser that looked like > "1234567890", but I don't see what that has to gain over the current form. I'm not arguing it's a gain, simply that it's a case that *should* be handled by extractors of translatable strings anyway, and if it were, there would not be an I18N issue in this PEP. It *should* be handled because this is just constant folding. Any half-witted compiler does it, and programmers expect their compilers to do it. pygettext and xgettext are (very special) compilers. I don't see why that expectation should be violated just because the constants in question are translatable strings. I recognize that for xgettext implementing that in C for languages as disparate as Lisp, Python, and Perl (all of which have string concatenation operators) is hard, and to the extent that xgettext is recommended by 9 out of 10 translators, we need to worry about how long it's going to take for xgettext to get fixed (because it *is* broken in this respect, at least for Python). From percivall at gmail.com Fri May 4 13:05:45 2007 From: percivall at gmail.com (Simon Percivall) Date: Fri, 4 May 2007 13:05:45 +0200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: <463AB2E1.2030408@canterbury.ac.nz> References: <463AB2E1.2030408@canterbury.ac.nz> Message-ID: On 4 maj 2007, at 06.13, Greg Ewing wrote: > Simon Percivall wrote: >> if the proposal is constrained to only allowing the *name at >> the end, wouldn't a more useful behavior be to not exhaust the >> iterator, making it similar to: >> > it = iter(range(10)) >> > a = next(it) >> > b = it >> or would this be too surprising? > > It would surprise the heck out of me when I started > with something that wasn't an iterator and ended > up with b being something that I could only iterate > and couldn't index. Yes, that would be surprising. This was more in the way of returning the type that was given: if you start with a list you end up with a list in "b", if you start with an iterator you end up with an iterator. This would enable stuff like using this with itertools.count and other iterators that represent infinite sequences. Also, I'm not intending to argue this, but exhausting the iterator is not exactly like *args in argument lists, because the iterator isn't the name being starred. It's more like the formal parameter of a function, when the receiver of the iterator _is_ starred, but the iterator is not. The iterator isn't automatically exhausted in those cases. //Simon From mike_mp at zzzcomputing.com Fri May 4 16:21:59 2007 From: mike_mp at zzzcomputing.com (Michael Bayer) Date: Fri, 4 May 2007 10:21:59 -0400 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1><5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> Message-ID: On May 4, 2007, at 1:12 AM, Raymond Hettinger wrote: > From: "Greg Ewing" >> It has nothing to do with cyclic GC. The point is that >> if the refcount of a weak reference drops to zero before >> that of the object being weakly referenced, the weak >> reference object itself is deallocated and its callback >> is *not* called. So having the resource-using object >> hold the weak ref to the resource doesn't work -- it >> has to be kept in some kind of separate registry. > > I'll write-up an idiomaticc approach an include it in PEP this > weekend. > why not encapsulate the "proper" weakref-based approach in an easy-to- use method such as "__close__()" ? that way nobody has to guess how to follow this pattern. From python at rcn.com Fri May 4 17:22:45 2007 From: python at rcn.com (Raymond Hettinger) Date: Fri, 4 May 2007 08:22:45 -0700 Subject: [Python-3000] PEP: Eliminate __del__ References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1><5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> Message-ID: <01dd01c78e60$11d8e460$f001a8c0@RaymondLaptop1> [Michael Bayer] > why not encapsulate the "proper" weakref-based approach in an easy-to- > use method such as "__close__()" ? that way nobody has to guess how > to follow this pattern. An encapsulating function should be added to the weakref module so that Guido's example could be written as: class BufferedWriter: def __init__(self, raw): self.raw = raw self.buffer = "" weakref.cleanup(self, lambda s: s.raw.write(s.buffer)) def write(self, data): self.buffer += data if len(self.buffer) >= 8192: self.flush() def flush(self): self.raw.write(self.buffer) self.buffer = "" I've got a first cut at an encapsulating function but am not happy with it yet. There is almost certainly a better way. First draft: def cleanup(obj, callback, _reg = []): class AttrMap(object): def __init__(self, map): self._map = map def __getattr__(self, key): return self._map[key] def wrapper(wr, mp=AttrMap(obj.__dict__), callback=callback): _reg.remove(wr) callback(mp) _reg.append(ref(obj, wrapper)) Raymond From steven.bethard at gmail.com Fri May 4 17:54:40 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 4 May 2007 09:54:40 -0600 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: <463AB5F0.7020407@canterbury.ac.nz> References: <463AB5F0.7020407@canterbury.ac.nz> Message-ID: On 5/3/07, Greg Ewing wrote: > Steven Bethard wrote: > > > This brings up the question of why the patch produces lists, not > > tuples. What's the reasoning behind that? > > When dealing with an iterator, you don't know the > length in advance, so the only way to get a tuple > would be to produce a list first and then create > a tuple from it. Yep. That was one of the reasons it was suggested that the *args should only appear at the end of the tuple unpacking. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From mike_mp at zzzcomputing.com Fri May 4 18:45:07 2007 From: mike_mp at zzzcomputing.com (Michael Bayer) Date: Fri, 4 May 2007 12:45:07 -0400 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <463B4455.7060100@develer.com> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1><5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> <463B4455.7060100@develer.com> Message-ID: <8B815829-8E98-4547-BC99-14E2241C13CB@zzzcomputing.com> On May 4, 2007, at 10:33 AM, Giovanni Bajo wrote: > On 5/4/2007 4:21 PM, Michael Bayer wrote: > >>> >> why not encapsulate the "proper" weakref-based approach in an easy- >> to-use method such as "__close__()" ? that way nobody has to >> guess how to follow this pattern. > > Because the idea is that the callback of the weakref will *NOT* > hold a reference to the object being destroyed, but only to the > resources that need to be deallocated (that is, to the objects > bound as attributes of the object). a __close__() method on a class is first bound to the class, not any particular self. the Python runtime could detect this and create the appropriate callable/weakref scenario behind the scenes; not even binding __close__() to the self in the usual way. obviously it cant be a pure python solution, it would have to be a specific runtime supported idea (the same way __metaclass__ or any other magic attribute is supported). i just dont understand why such an important feature would have to be relegated to just a "recipe". i think thats a product of the notion that "implicit finalizers are bad, use try/finally". thats not really valid for things like buffers that flush and database/network connections that must be released when they fall out of scope. From guido at python.org Fri May 4 19:15:19 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 4 May 2007 10:15:19 -0700 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <01dd01c78e60$11d8e460$f001a8c0@RaymondLaptop1> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> <01dd01c78e60$11d8e460$f001a8c0@RaymondLaptop1> Message-ID: On 5/4/07, Raymond Hettinger wrote: > An encapsulating function should be added to the weakref module > so that Guido's example could be written as: > > class BufferedWriter: > > def __init__(self, raw): > self.raw = raw > self.buffer = "" > weakref.cleanup(self, lambda s: s.raw.write(s.buffer)) Or, instead of a new lambda, just use the unbound method: weakref.cleanup(self, self.__class__.flush) Important: use the dynamic class (self.__class___), not the static class (BufferedWriter). The distinction matters when BufferedWriter is subclassed and the subclass overrides flush(). Hm, a thought just occurred to me. Why not arrange for object.__new__ to call [the moral equivalent of] weakref.cleanup(self, self.__class__.__del__), and get rid of the direct call to __del__ from the destructor? (And the special-casing of objects with __del__ in the GC module, of course.) Then classes that define __del__ won't have to be changed at all. (Of course dynamically patching a different __del__ method into the class won't have quite exactly the same semantics, but I don't really care about such a fragile and rare possibility; I care about vanilla use of __del__ methods.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Fri May 4 20:02:45 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 4 May 2007 12:02:45 -0600 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> <01dd01c78e60$11d8e460$f001a8c0@RaymondLaptop1> Message-ID: On 5/4/07, Guido van Rossum wrote: > On 5/4/07, Raymond Hettinger wrote: > > An encapsulating function should be added to the weakref module > > so that Guido's example could be written as: > > > > class BufferedWriter: > > > > def __init__(self, raw): > > self.raw = raw > > self.buffer = "" > > weakref.cleanup(self, lambda s: s.raw.write(s.buffer)) > > Or, instead of a new lambda, just use the unbound method: > > weakref.cleanup(self, self.__class__.flush) > > Important: use the dynamic class (self.__class___), not the static > class (BufferedWriter). The distinction matters when BufferedWriter is > subclassed and the subclass overrides flush(). > > Hm, a thought just occurred to me. Why not arrange for object.__new__ > to call [the moral equivalent of] weakref.cleanup(self, > self.__class__.__del__), and get rid of the direct call to __del__ > from the destructor? (And the special-casing of objects with __del__ > in the GC module, of course.) That seems like a good idea, though I'm still a little unclear as to how far the AttrMap should be going to look like a real instance. As it stands, you can only access items from the instance __dict__. That means no methods, class attributes, etc.:: >>> import weakref >>> def cleanup(obj, callback, _reg=[]): ... class AttrMap(object): ... def __init__(self, map): ... self._map = map ... def __getattr__(self, key): ... return self._map[key] ... def wrapper(wr, mp=AttrMap(obj.__dict__), callback=callback): ... _reg.remove(wr) ... callback(mp) ... _reg.append(weakref.ref(obj, wrapper)) ... >>> class Object(object): ... # note that we do this in __init__ because in __new__, the ... # object has no references to it yet ... def __init__(self): ... super(Object, self).__init__() ... if hasattr(self.__class__, '__newdel__'): ... # note we use .im_func so that we can later pass ... # any object as the "self" parameter ... cleanup(self, self.__class__.__newdel__.im_func) ... >>> class Foo(Object): ... def flush(self): ... print 'flushing' ... def __newdel__(self): ... print 'deleting' ... self.flush() ... >>> f = Foo() >>> del f deleting Exception exceptions.KeyError: 'flush' in ignored STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Fri May 4 20:09:42 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 4 May 2007 11:09:42 -0700 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> <01dd01c78e60$11d8e460$f001a8c0@RaymondLaptop1> Message-ID: On 5/4/07, Steven Bethard wrote: > On 5/4/07, Guido van Rossum wrote: > > On 5/4/07, Raymond Hettinger wrote: > > > An encapsulating function should be added to the weakref module > > > so that Guido's example could be written as: > > > > > > class BufferedWriter: > > > > > > def __init__(self, raw): > > > self.raw = raw > > > self.buffer = "" > > > weakref.cleanup(self, lambda s: s.raw.write(s.buffer)) > > > > Or, instead of a new lambda, just use the unbound method: > > > > weakref.cleanup(self, self.__class__.flush) > > > > Important: use the dynamic class (self.__class___), not the static > > class (BufferedWriter). The distinction matters when BufferedWriter is > > subclassed and the subclass overrides flush(). > > > > Hm, a thought just occurred to me. Why not arrange for object.__new__ > > to call [the moral equivalent of] weakref.cleanup(self, > > self.__class__.__del__), and get rid of the direct call to __del__ > > from the destructor? (And the special-casing of objects with __del__ > > in the GC module, of course.) > > That seems like a good idea, though I'm still a little unclear as to > how far the AttrMap should be going to look like a real instance. As > it stands, you can only access items from the instance __dict__. That > means no methods, class attributes, etc.:: Oh, you mean 'self' as passed to the callback is not the instance? That kills the whole idea (since the typical __del__ calls self.flush() or self.close()). > >>> import weakref > >>> def cleanup(obj, callback, _reg=[]): > ... class AttrMap(object): > ... def __init__(self, map): > ... self._map = map > ... def __getattr__(self, key): > ... return self._map[key] > ... def wrapper(wr, mp=AttrMap(obj.__dict__), callback=callback): > ... _reg.remove(wr) > ... callback(mp) > ... _reg.append(weakref.ref(obj, wrapper)) > ... > >>> class Object(object): > ... # note that we do this in __init__ because in __new__, the > ... # object has no references to it yet > ... def __init__(self): > ... super(Object, self).__init__() > ... if hasattr(self.__class__, '__newdel__'): > ... # note we use .im_func so that we can later pass > ... # any object as the "self" parameter > ... cleanup(self, self.__class__.__newdel__.im_func) > ... > >>> class Foo(Object): > ... def flush(self): > ... print 'flushing' > ... def __newdel__(self): > ... print 'deleting' > ... self.flush() > ... > >>> f = Foo() > >>> del f > deleting > Exception exceptions.KeyError: 'flush' in 0x00F34630> ignored If it really has to be done this way, I think the whole PEP is doomed. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Fri May 4 20:35:28 2007 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 4 May 2007 12:35:28 -0600 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> <01dd01c78e60$11d8e460$f001a8c0@RaymondLaptop1> Message-ID: On 5/4/07, Guido van Rossum wrote: > On 5/4/07, Steven Bethard wrote: > > On 5/4/07, Guido van Rossum wrote: > > > Hm, a thought just occurred to me. Why not arrange for object.__new__ > > > to call [the moral equivalent of] weakref.cleanup(self, > > > self.__class__.__del__), and get rid of the direct call to __del__ > > > from the destructor? (And the special-casing of objects with __del__ > > > in the GC module, of course.) > > > > That seems like a good idea, though I'm still a little unclear as to > > how far the AttrMap should be going to look like a real instance. As > > it stands, you can only access items from the instance __dict__. That > > means no methods, class attributes, etc.:: > > Oh, you mean 'self' as passed to the callback is not the instance? > That kills the whole idea (since the typical __del__ calls > self.flush() or self.close()). > [..snip example using __dict__..] > > If it really has to be done this way, I think the whole PEP is doomed. Any attempt that keeps the entire contents of __dict__ alive is doomed. It's likely to contain a cycle back to the original object, and avoiding that is the whole point of jumping through these hoops. I've got a metaclass that moves explicitly marked attributes and methods into a "core" object, allowing you to write code like this: class MyFile(safedel): __coreattrs__ = ['_fd'] def __init__(self, path): super(MyFile, self).__init__() self._fd = os.open(path, ...) @coremethod def __safedel__(core): core.close() @coremethod def close(core): # This method is written to be idempotent if core._fd is not None: os.close(core._fd) core._fd = None I've submitted it to the python cookbook, but I don't know how long it'll take to get posted; it's a little on the long side at 163 lines. The biggest limitation is you can't easily use super() in core methods, although the proposed changes to super() would probably fix this. -- Adam Olsen, aka Rhamphoryncus From python at rcn.com Fri May 4 20:37:59 2007 From: python at rcn.com (Raymond Hettinger) Date: Fri, 4 May 2007 14:37:59 -0400 (EDT) Subject: [Python-3000] PEP: Eliminate __del__ Message-ID: <20070504143759.BIA64211@ms09.lnh.mail.rcn.net> > If it really has to be done this way, I think the whole PEP is doomed. This thread is getting way ahead of me and starting to self-destruct before I've had a chance to put together a concrete proposal and scan existing code for use cases. Can I please press the button for a few days until I can offer a useful starting point. So far, it is clear that some of the everyday use-cases can be handled trivially, but there are some use cases that are not going to yield without much more thought. Raymond From steve at holdenweb.com Fri May 4 20:51:00 2007 From: steve at holdenweb.com (Steve Holden) Date: Fri, 04 May 2007 14:51:00 -0400 Subject: [Python-3000] PEP 30XZ: Simplified Parsing In-Reply-To: <4638B151.6020901@voidspace.org.uk> References: <4638B151.6020901@voidspace.org.uk> Message-ID: Michael Foord wrote: > Jim Jewett wrote: >> PEP: 30xz >> Title: Simplified Parsing >> Version: $Revision$ >> Last-Modified: $Date$ >> Author: Jim J. Jewett >> Status: Draft >> Type: Standards Track >> Content-Type: text/plain >> Created: 29-Apr-2007 >> Post-History: 29-Apr-2007 >> >> >> Abstract >> >> Python initially inherited its parsing from C. While this has >> been generally useful, there are some remnants which have been >> less useful for python, and should be eliminated. >> >> + Implicit String concatenation >> >> + Line continuation with "\" >> >> + 034 as an octal number (== decimal 28). Note that this is >> listed only for completeness; the decision to raise an >> Exception for leading zeros has already been made in the >> context of PEP XXX, about adding a binary literal. >> >> >> Rationale for Removing Implicit String Concatenation >> >> Implicit String concatentation can lead to confusing, or even >> silent, errors. [1] >> >> def f(arg1, arg2=None): pass >> >> f("abc" "def") # forgot the comma, no warning ... >> # silently becomes f("abcdef", None) >> >> > Implicit string concatenation is massively useful for creating long > strings in a readable way though: > > call_something("first part\n" > "second line\n" > "third line\n") > > I find it an elegant way of building strings and would be sad to see it > go. Adding trailing '+' signs is ugly. > Currently at least possible, though doubtless some people won't like the left-hand alignment, is call_something("""\ first part second part third part """) Alas if the proposal to remove the continuation backslash goes through this may not remain available to us. I realise that the arrival of Py3 means all these are up for grabs, but don't think any of them are really warty enough to require removal. I take the point that octal constants are counter-intuitive and wouldn't be too disappointed by their removal. I still think Icon had the right answer there in allowing an explicit decimal radix in constants, so 16 as a binary constant would be 10000r2, or 10r16. IIRC it still allowed 0x10 as well (though Tim may shoot me down there). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden ------------------ Asciimercial --------------------- Get on the web: Blog, lens and tag your way to fame!! holdenweb.blogspot.com squidoo.com/pythonology tagged items: del.icio.us/steve.holden/python All these services currently offer free registration! -------------- Thank You for Reading ---------------- From jimjjewett at gmail.com Fri May 4 21:09:46 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 4 May 2007 15:09:46 -0400 Subject: [Python-3000] updated PEP3125, Remove Backslash Continuation Message-ID: Major rewrite. The inside-a-string continuation is separated from the general continuation. The alternatives section is expaned to als list Andrew Koenig's improved inside-expressions variant, since that is a real contender. If anyone feels I haven't acknowledged their concerns, please tell me. -------------- PEP: 3125 Title: Remove Backslash Continuation Version: $Revision$ Last-Modified: $Date$ Author: Jim J. Jewett Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 29-Apr-2007 Post-History: 29-Apr-2007, 30-Apr-2007, 04-May-2007 Abstract ======== Python initially inherited its parsing from C. While this has been generally useful, there are some remnants which have been less useful for python, and should be eliminated. This PEP proposes elimination of terminal ``\`` as a marker for line continuation. Motivation ========== One goal for Python 3000 should be to simplify the language by removing unnecessary or duplicated features. There are currently several ways to indicate that a logical line is continued on the following physical line. The other continuation methods are easily explained as a logical consequence of the semantics they provide; ``\`` is simply an escape character that needs to be memorized. Existing Line Continuation Methods ================================== Parenthetical Expression - ([{}]) --------------------------------- Open a parenthetical expression. It doesn't matter whether people view the "line" as continuing; they do immediately recognize that the expression needs to be closed before the statement can end. An examples using each of (), [], and {}:: def fn(long_argname1, long_argname2): settings = {"background": "random noise" "volume": "barely audible"} restrictions = ["Warrantee void if used", "Notice must be recieved by yesterday" "Not responsible for sales pitch"] Note that it is always possible to parenthesize an expression, but it can seem odd to parenthesize an expression that needs them only for the line break:: assert val>4, ( "val is too small") Triple-Quoted Strings --------------------- Open a triple-quoted string; again, people recognize that the string needs to finish before the next statement starts. banner_message = """ Satisfaction Guaranteed, or DOUBLE YOUR MONEY BACK!!! some minor restrictions apply""" Terminal ``\`` in the general case ---------------------------------- A terminal ``\`` indicates that the logical line is continued on the following physical line (after whitespace). There are no particular semantics associated with this. This form is never required, although it may look better (particularly for people with a C language background) in some cases:: >>> assert val>4, \ "val is too small" Also note that the ``\`` must be the final character in the line. If your editor navigation can add whitespace to the end of a line, that invisible change will alter the semantics of the program. Fortunately, the typical result is only a syntax error, rather than a runtime bug:: >>> assert val>4, \ "val is too small" SyntaxError: unexpected character after line continuation character This PEP proposes to eliminate this redundant and potentially confusing alternative. Terminal ``\`` within a string ------------------------------ A terminal ``\`` within a single-quoted string, at the end of the line. This is arguably a special case of the terminal ``\``, but it is a special case that may be worth keeping. >>> "abd\ def" 'abd def' + Many of the objections to removing ``\`` termination were really just objections to removing it within literal strings; several people clarified that they want to keep this literal-string usage, but don't mind losing the general case. + The use of ``\`` for an escape character within strings is well known. - But note that this particular usage is odd, because the escaped character (the newline) is invisible, and the special treatment is to delete the character. That said, the ``\`` of ``\(newline)`` is still an escape which changes the meaning of the following character. Alternate Proposals =================== Several people have suggested alternative ways of marking the line end. Most of these were rejected for not actually simplifying things. The one exception was to let any unfished expression signify a line continuation, possibly in conjunction with increased indentation. This is attractive because it is a generalization of the rule for parentheses. The initial objections to this were: - The amount of whitespace may be contentious; expression continuation should not be confused with opening a new suite. - The "expression continuation" markers are not as clearly marked in Python as the grouping punctuation "(), [], {}" marks are:: # Plus needs another operand, so the line continues "abc" + "def" # String ends an expression, so the line does not # not continue. The next line is a syntax error because # unary plus does not apply to strings. "abc" + "def" - Guido objected for technical reasons. [#dedent]_ The most obvious implementation would require allowing INDENT or DEDENT tokens anywhere, or at least in a widely expanded (and ill-defined) set of locations. While this is concern only for the internal parsing mechanism (rather than for users), it would be a major new source of complexity. Andrew Koenig then pointed out [#lexical]_ a better implementation strategy, and said that it had worked quite well in other languages. [#snocone]_ The improved suggestion boiled down to:: The whitespace that follows an (operator or) open bracket or parenthesis can include newline characters. It would be implemented at a very low lexical level -- even before the decision is made to turn a newline followed by spaces into an INDENT or DEDENT token. There is still some concern that it could mask bugs, as in this example [#guidobughide]_:: # Used to be y+1, the 1 got dropped. Syntax Error (today) # would become nonsense. x = y+ f(x) Requiring that the continuation be indented more than the initial line would add both safety and complexity. Open Issues =========== + Should ``\``-continuation be removed even inside strings? + Should the continuation markers be expanced from just ([{}]) to include lines ending with an operator? + As a safety measure, should the continuation line be required to be more indented than the initial line? References ========== .. [#dedent] (email subject) PEP 30XZ: Simplified Parsing, van Rossum http://mail.python.org/pipermail/python-3000/2007-April/007063.html .. [#lexical] (email subject) PEP-3125 -- remove backslash continuation, Koenig http://mail.python.org/pipermail/python-3000/2007-May/007237.html .. [#snocone] The Snocone Programming Language, Koenig http://www.snobol4.com/report.htm .. [#guidobughide] (email subject) PEP-3125 -- remove backslash continuation, van Rossum http://mail.python.org/pipermail/python-3000/2007-May/007244.html Copyright ========= This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From steven.bethard at gmail.com Fri May 4 21:30:15 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 4 May 2007 13:30:15 -0600 Subject: [Python-3000] [Python-Dev] updated PEP3125, Remove Backslash Continuation In-Reply-To: References: Message-ID: [cc -python-dev] On 5/4/07, Jim Jewett wrote: > Open Issues > =========== > > + Should ``\``-continuation be removed even inside strings? I'm a strong -1 on this PEP if ``\``-continuation is removed from inside triple-quoted strings. I'd hate to have to go from writing:: >>> textwrap.dedent('''\ ... foo ... bar ... ''') 'foo\nbar\n' to writing:: >>> textwrap.dedent(''' ... foo ... bar ... '''[1:]) 'foo\nbar\n' or maybe:: >>> textwrap.dedent(''' ... foo ... bar ... '''.lstrip('\n')) 'foo\nbar\n' > + Should the continuation markers be expanced from just ([{}]) > to include lines ending with an operator? I think the only way to answer this is to have someone actually implement it, so that we can evaluate the complexity of the implementation. If someone can produce a patch, we can talk about this. > + As a safety measure, should the continuation line be required > to be more indented than the initial line? Again, let's see a patch and we can talk about it. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From mike.klaas at gmail.com Fri May 4 22:45:00 2007 From: mike.klaas at gmail.com (Mike Klaas) Date: Fri, 4 May 2007 13:45:00 -0700 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: References: <4638B151.6020901@voidspace.org.uk> Message-ID: <3d2ce8cb0705041345m5b5d2b30oe11b0d392e7324cd@mail.gmail.com> On 5/4/07, Baptiste Carvello wrote: > maybe we could have a "dedent" literal that would remove the first newline and > all indentation so that you can just write: > > call_something( d''' > first part > second line > third line > ''' ) Surely from textwrap import dedent as d is close enough? -Mike From baptiste13 at altern.org Fri May 4 22:47:07 2007 From: baptiste13 at altern.org (Baptiste Carvello) Date: Fri, 04 May 2007 22:47:07 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <46371BD2.7050303@v.loewis.de> References: <46371BD2.7050303@v.loewis.de> Message-ID: Martin v. L?wis a ?crit : > PEP: 31xx > Title: Supporting Non-ASCII Identifiers > Version: $Revision$ > Last-Modified: $Date$ > Author: Martin v. L?wis > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 1-May-2007 > Python-Version: 3.0 > Post-History: > > Abstract > ======== > > This PEP suggests to support Non-ASCII letters (such as accented > characters, Cyrillic, Greek, Kanji, etc.) in Python identifiers. > If this is to ever happen, it should be only accessible through a command-line option to python. That way we make sure people are aware that they are making their code incompatible with the larger world. Cheers, Baptiste From nevillegrech at gmail.com Fri May 4 22:51:19 2007 From: nevillegrech at gmail.com (Neville Grech Neville Grech) Date: Fri, 4 May 2007 22:51:19 +0200 Subject: [Python-3000] [Python-Dev] updated PEP3125, Remove Backslash Continuation In-Reply-To: References: Message-ID: This PEP is much more reasonable. Should ``\``-continuation be removed even inside strings? -1 Backslash continuation in strings is used a lot.. especially in strings that must not start with a newline but are written in the following format for clarity: '''\ first line second line\ ''' Should the continuation markers be expanced from just ([{}]) to include lines ending with an operator? -1 I think that the following is much more clear: a=(3 + 2 + 4) f(x) than: a= 3+ 2+ 4 f(x) On 5/4/07, Steven Bethard wrote: > > [cc -python-dev] > > On 5/4/07, Jim Jewett wrote: > > Open Issues > > =========== > > > > + Should ``\``-continuation be removed even inside strings? > > I'm a strong -1 on this PEP if ``\``-continuation is removed from > inside triple-quoted strings. I'd hate to have to go from writing:: > > >>> textwrap.dedent('''\ > ... foo > ... bar > ... ''') > 'foo\nbar\n' > > to writing:: > > >>> textwrap.dedent(''' > ... foo > ... bar > ... '''[1:]) > 'foo\nbar\n' > > or maybe:: > > >>> textwrap.dedent(''' > ... foo > ... bar > ... '''.lstrip('\n')) > 'foo\nbar\n' > > > + Should the continuation markers be expanced from just ([{}]) > > to include lines ending with an operator? > > I think the only way to answer this is to have someone actually > implement it, so that we can evaluate the complexity of the > implementation. If someone can produce a patch, we can talk about > this. > > > + As a safety measure, should the continuation line be required > > to be more indented than the initial line? > > Again, let's see a patch and we can talk about it. > > > STeVe > -- > I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a > tiny blip on the distant coast of sanity. > --- Bucky Katt, Get Fuzzy > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/nevillegrech%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070504/7326cc0e/attachment.htm From martin at v.loewis.de Sat May 5 01:00:27 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 05 May 2007 01:00:27 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: References: <46371BD2.7050303@v.loewis.de> Message-ID: <463BBB0B.40703@v.loewis.de> > If this is to ever happen, it should be only accessible through a command-line > option to python. That way we make sure people are aware that they are making > their code incompatible with the larger world. In what way will the source code be incompatible with the larger world? Martin From guido at python.org Sat May 5 01:10:05 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 4 May 2007 16:10:05 -0700 Subject: [Python-3000] Can someone please make py3k* checkins go to the python-3000-checkins mailing list? Message-ID: I don't know how the filters for checkin emails are set up, but this seems wrong: mail related to the p3yk branch goes to python-3000-checkins, but mail related to the py3k-unistr branch goes to python-checkins. There are a bunch of branches of relevance to py3k now; these should all go to the python-3000-checkins list. I suggest to filter on branches that start with either py3k or with p3yk. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Sat May 5 03:23:39 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 05 May 2007 13:23:39 +1200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: <463AB2E1.2030408@canterbury.ac.nz> Message-ID: <463BDC9B.2030500@canterbury.ac.nz> Simon Percivall wrote: > This was more in the way of returning the type that was given: > if you start with a list you end up with a list in "b", if you > start with an iterator you end up with an iterator. I don't think that returning the type given is a goal that should be attempted, because it can only ever work for a fixed set of known types. Given an arbitrary sequence type, there is no way of knowing how to create a new instance of it with specified contents. -- Greg From daniel at stutzbachenterprises.com Sat May 5 03:44:03 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Fri, 4 May 2007 20:44:03 -0500 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: <463BDC9B.2030500@canterbury.ac.nz> References: <463AB2E1.2030408@canterbury.ac.nz> <463BDC9B.2030500@canterbury.ac.nz> Message-ID: On 5/4/07, Greg Ewing wrote: > I don't think that returning the type given is a goal > that should be attempted, because it can only ever work > for a fixed set of known types. Given an arbitrary > sequence type, there is no way of knowing how to > create a new instance of it with specified contents. For objects that support the sequence protocol, how about specifying that: a, *b = container_object must be equivalent to: a, b = container_object[0], container_object[1:] That way, b is assigned whatever container_object's getslice method returns. A list will return a list, a tuple will return a tuple, and widgets (or BLists...) can return whatever makes sense for them. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From greg.ewing at canterbury.ac.nz Sat May 5 04:07:00 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 05 May 2007 14:07:00 +1200 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <20070504143759.BIA64211@ms09.lnh.mail.rcn.net> References: <20070504143759.BIA64211@ms09.lnh.mail.rcn.net> Message-ID: <463BE6C4.5060309@canterbury.ac.nz> Raymond Hettinger wrote: > Can I please press the button for a few days until I can offer > a useful starting point. Before you go any further, the important thing to take from the thread so far is that you mustn't keep the whole contents of the object's __dict__ alive via the callback. -- Greg From foom at fuhm.net Sat May 5 06:22:57 2007 From: foom at fuhm.net (James Y Knight) Date: Sat, 5 May 2007 00:22:57 -0400 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: References: <46371BD2.7050303@v.loewis.de> Message-ID: On May 4, 2007, at 4:47 PM, Baptiste Carvello wrote: > If this is to ever happen, it should be only accessible through a > command-line > option to python. That way we make sure people are aware that they > are making > their code incompatible with the larger world. That's ridiculous. Without your special option, the code would run perfectly well on pythons world-wide. Requiring a special option is a surefire way to *ensure* compatibility issues, of course... James From ncoghlan at gmail.com Sat May 5 12:12:36 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 05 May 2007 20:12:36 +1000 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <8B815829-8E98-4547-BC99-14E2241C13CB@zzzcomputing.com> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1><5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> <463B4455.7060100@develer.com> <8B815829-8E98-4547-BC99-14E2241C13CB@zzzcomputing.com> Message-ID: <463C5894.8020601@gmail.com> Michael Bayer wrote: > i just dont understand why such an important feature would have to be > relegated to just a "recipe". i think thats a product of the notion > that "implicit finalizers are bad, use try/finally". thats not > really valid for things like buffers that flush and database/network > connections that must be released when they fall out of scope. Implicit finalizers are typically bad because they don't provide any kind of guarantee as to when they're going to be executed - all they promise is "eventually, maybe". If the gc is paused or disabled for some reason, the answer is quite possibly never (and with current __del__ semantics, the answer in CPython may be never even when full gc is running). It's just a quirk of CPython that "eventually" normally translates to "when the variable goes out of scope" for objects that don't participate in cycles. Accordingly, anything which requires explicit finalization (such as flushing a buffer, or releasing a database connection) needs to migrate towards using the context management protocol and the with statement to ensure things are cleaned up properly regardless of the GC semantics that currently happen to be in force. Implicit finalization still has a place though, and it is curently supported in a far more definite fashion by using a weakref callback and leaving __del__ undefined. The downside is that the current weakref module leaves you with some extra work to do before you can easily use it for finalization. The reason for initially pursuing a recipe approach for weakref based finalisation is that it allows time to determine whether or not there are better recipes than whatever is proposed in the PEP before casting it in the form of fixed language syntax. Adding syntactic sugar for a recipe is child's play compared to trying to get rid of syntax (or change its semantics) after discovering it is broken in some fashion. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From rasky at develer.com Sat May 5 13:21:59 2007 From: rasky at develer.com (Giovanni Bajo) Date: Sat, 05 May 2007 13:21:59 +0200 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> <01dd01c78e60$11d8e460$f001a8c0@RaymondLaptop1> Message-ID: On 04/05/2007 20.35, Adam Olsen wrote: > Any attempt that keeps the entire contents of __dict__ alive is > doomed. It's likely to contain a cycle back to the original object, > and avoiding that is the whole point of jumping through these hoops. Uh? If __dict__ contains a cycle back to the original object, then the object is part of a cycle already, with or without getting an additional reference to the __dict__ within the finalization callback. And if there's no cycle, you're not creating one by just referencing __dict__. -- Giovanni Bajo From rasky at develer.com Sat May 5 13:39:37 2007 From: rasky at develer.com (Giovanni Bajo) Date: Sat, 05 May 2007 13:39:37 +0200 Subject: [Python-3000] PEP to change how the main module is delineated In-Reply-To: References: Message-ID: On 24/04/2007 0.05, Guido van Rossum wrote: >> This PEP is to change the ``if __name__ == "__main__": ...`` idiom to >> ``if __name__ == sys.main: ...`` so that you at least have a chance >> to execute module in a package that use relative imports. >> >> Ran this PEP past python-ideas. Stopped the discussion there when too >> many new ideas were being proposed. =) I have listed all of them in >> the Rejected Ideas section, although if overwhelming support for one >> comes forward the PEP can shift to one of them. > > I'm -1 on this and on any other proposed twiddlings of the __main__ > machinery. The only use case seems to be running scripts that happen > to be living inside a module's directory, which I've always seen as an > antipattern. To make me change my mind you'd have to convince me that > it isn't. Sometimes, beginners get confused because of this. They start with a single module, it grows and grows, until they split it into another module. But if the two modules then import each other, there is an asymmetry because one is internally renamed to __main__. For instance: ==== a.py ==== class A: pass if __name__ == "__main__": a = A() print A.__name__ print a.__class__ import b b.run(a) =============== ==== b.py ==== from a import A def run(a): print A print a.__class__ assert isinstance(a, A) # FAIL! ============== $ python a.py A __main__.A a.A __main__.A Traceback (most recent call last): File "a.py", line 9, in ? b.run(a) File "E:\work\b.py", line 6, in run assert isinstance(a, A) # FAIL! AssertionError I think this behaviour confuses many beginners, and it is unnatural for experts too. I've got bitten a few times in the past. I still believe that it would be much easier to just support an explicit __main__ function: ==== a.py ==== class A: pass def __main__(): a = A() print A.__name__ print a.__class__ import b b.run(a) =============== which is easier to read for beginners and let the main module keep its original name, thus not causinng these weird side-effects. -- Giovanni Bajo From exarkun at divmod.com Sat May 5 18:27:08 2007 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Sat, 5 May 2007 12:27:08 -0400 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: Message-ID: <20070505162708.19381.10763891.divmod.quotient.8722@ohm> On Sat, 05 May 2007 13:21:59 +0200, Giovanni Bajo wrote: >On 04/05/2007 20.35, Adam Olsen wrote: > >> Any attempt that keeps the entire contents of __dict__ alive is >> doomed. It's likely to contain a cycle back to the original object, >> and avoiding that is the whole point of jumping through these hoops. > >Uh? If __dict__ contains a cycle back to the original object, then the object >is part of a cycle already, with or without getting an additional reference to >the __dict__ within the finalization callback. If the __dict__ contains a cycle back to the original object, then if you keep the __dict__ alive in the weakref callback (which is what you are doing if the weakref callback references the __dict__ - it does not weakly reference it), then you will keep the original object alive and the weakref callback will never run, because the original object will live forever. Contrariwise, if the weakref callback has only a reference to the particular objects which it needs, then it doesn't matter if there is a cycle through some _other_ objects which are in the __dict__, since the weakref callback will not keep the cycle alive: eventually the cyclic gc will clean up the cycle (but leave the objects referenced by the weakref callback alone, since the weakref callback is keeping them alive), then the weakref callback will run since it was a weakref to the original object which has now been collected, the weakref callback will be able to use the specific references it has to do some cleanup, and then most likely both the weakref object, the weakref callback object, and whatever specific objects it held references to will be dropped and eventually collected (though this is not necessarily the case, since the weakref callback could choose to keep the specific objects it referenced (not the object weakly referenced in the first place) alive by putting them into some other living container). Jean-Paul From mike_mp at zzzcomputing.com Sat May 5 21:46:51 2007 From: mike_mp at zzzcomputing.com (Michael Bayer) Date: Sat, 5 May 2007 15:46:51 -0400 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <463C5894.8020601@gmail.com> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1><5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> <463B4455.7060100@develer.com> <8B815829-8E98-4547-BC99-14E2241C13CB@zzzcomputing.com> <463C5894.8020601@gmail.com> Message-ID: On May 5, 2007, at 6:12 AM, Nick Coghlan wrote: > > The reason for initially pursuing a recipe approach for weakref > based finalisation is that it allows time to determine whether or > not there are better recipes than whatever is proposed in the PEP > before casting it in the form of fixed language syntax. Adding > syntactic sugar for a recipe is child's play compared to trying to > get rid of syntax (or change its semantics) after discovering it is > broken in some fashion. > if the recipe is just an interim step towards developing something that "just works", then we agree. obviously explicit finalization is preferable and relying upon cpython's "immediate" GC of non-cycled objects is a bad trap to fall into (particularly if you then run the same code using Jython for example)...but in a garbage collected language, the "loose ends" still need some way to clean themselves up even if its deferred. From tomerfiliba at gmail.com Sat May 5 15:29:47 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Sat, 5 May 2007 15:29:47 +0200 Subject: [Python-3000] the future of the GIL Message-ID: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> hi all i have to admit i've been neglecting the list in the past few months, and i don't know whether the issue i want to bring up has been settled already. as you all may have noticed, multicore processors are becoming more and more common in all kinds of machines, from desktops to servers, and will surely become more prevalent with time, as all major CPU vendors plan to ship 8-core processors by mid-2008. back in the day of uniprocessor machines, having the GIL really made life simpler and the sacrifice was negligible. however, running a threaded python script over an 8-core machine, where you can utilize at most 12.5% of the horsepower, seems like too large a sacrifice to me. the only way to overcome this with cpython is to Kill The GIL (TM), and since it's a very big architectural change, it ought to happen soon. pushing it further than version 3.0 means all library authors would have to adapt their code twice (once to make it compatible with 3.0, and then again to make it thread safe). i see all hell has broken loose here, PEP-wise speaking, but i really hope there's still time to consider killing the GIL at last. -tomer From steven.bethard at gmail.com Sun May 6 01:15:44 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sat, 5 May 2007 17:15:44 -0600 Subject: [Python-3000] the future of the GIL In-Reply-To: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> References: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> Message-ID: On 5/5/07, tomer filiba wrote: > the only way to overcome this with cpython is to Kill The GIL (TM), > and since it's a very big architectural change, it ought to happen > soon. pushing it further than version 3.0 means all library authors > would have to adapt their code twice (once to make it compatible > with 3.0, and then again to make it thread safe). > > i see all hell has broken loose here, PEP-wise speaking, but i really > hope there's still time to consider killing the GIL at last. You've missed the deadline for Python 3000 PEPs. (It was April 30th.) This discussion is also probably more appropriate for python-ideas until someone has something resembling an implementation ready... STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From greg.ewing at canterbury.ac.nz Sun May 6 01:14:30 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 06 May 2007 11:14:30 +1200 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <011601c78e0a$cadd0cd0$f001a8c0@RaymondLaptop1> <01dd01c78e60$11d8e460$f001a8c0@RaymondLaptop1> Message-ID: <463D0FD6.4030100@canterbury.ac.nz> Giovanni Bajo wrote: > Uh? If __dict__ contains a cycle back to the original object, then the object > is part of a cycle already, with or without getting an additional reference to > the __dict__ within the finalization callback. Yes, but storing a finalizer in a global registry that references the __dict__ makes it an *immortal* cycle, because the GC won't see it as an isolated cycle that's not referenced from outside. > And if there's no cycle, you're not creating one by just > referencing __dict__. It's not creation of the cycle that's the issue, it's keeping it alive forever once it's created. -- Greg From greg.ewing at canterbury.ac.nz Sun May 6 01:46:38 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 06 May 2007 11:46:38 +1200 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <463B02EB.6060006@develer.com> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <463B02EB.6060006@develer.com> Message-ID: <463D175E.1000201@canterbury.ac.nz> Giovanni Bajo wrote: > class Holder: > def __init__(self): > self.resource = .... > self.__wr = weakref(self.resource, ....) > > So, are you > saying that it's possible that the weakreference refcount goes to zero > *before* Holder's refcount? No, but depending on the order in which the dict contents gets decrefed when Holder is deallocated, the __wr attribute may get deallocated before the resource attribute. If that happens, the callback is never called. I have run the following code with Python 2.3, 2.4 and 2.5 and it does not print "Cleaning up": from weakref import ref class Resource: pass def cleanup(x): print "Cleaning up" class Holder: def __init__(self): self.resource = Resource() self.weakref = ref(self.resource, cleanup) h = Holder() del h > Are you saying that the fact that it works for me in real-world code is > just out of luck and might randomically break? If this is really what you're doing, then yes, it will randomly break. I may have misunderstood exactly what it is you're doing, however. -- Greg From talin at acm.org Sun May 6 02:57:51 2007 From: talin at acm.org (Talin) Date: Sat, 05 May 2007 17:57:51 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> References: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> Message-ID: <463D280F.6070101@acm.org> tomer filiba wrote: > the only way to overcome this with cpython is to Kill The GIL (TM), > and since it's a very big architectural change, it ought to happen > soon. pushing it further than version 3.0 means all library authors > would have to adapt their code twice (once to make it compatible > with 3.0, and then again to make it thread safe). > > i see all hell has broken loose here, PEP-wise speaking, but i really > hope there's still time to consider killing the GIL at last. I've brought up this issue as well, but the consensus seems to be that this is just too hard to even consider for 3.0. Note that Jython and IronPython don't have the same restrictions in this regard as CPython. Both VMs are able to run in multiprocessing environments. (I don't know whether or not Jython/IronPython even have a GIL or not.) My suggested approach to making CPython concurrent is to first tackle the problem of garbage collection in a multiprocessing environment. Once that is done, the next piece would be to address the issues of thread safety of the interpreter's internal data structures. At one point, I started working on a generic, concurrent garbage collector that would be useful for a variety of interpreted languages such as Python, but I haven't had time to work on it lately. Its similar to the Boehm collector, except that it's designed for "cooperative" languages in which the collector knows about the structure of objects. When I last worked on it, I had gotten the "young generation" collection working, and I had just finished implementing the global heap, and was in the process of writing unit tests for it. I hadn't started on old-generation collection or cross-generation reference tracking. -- Talin From jcarlson at uci.edu Sun May 6 03:29:31 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sat, 05 May 2007 18:29:31 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> References: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> Message-ID: <20070505181324.649A.JCARLSON@uci.edu> "tomer filiba" wrote: > the only way to overcome this with cpython is to Kill The GIL (TM), > and since it's a very big architectural change, it ought to happen > soon. pushing it further than version 3.0 means all library authors > would have to adapt their code twice (once to make it compatible > with 3.0, and then again to make it thread safe). There are many solutions to handling the scaling of Python on multicore processors, only one of which is killing the GIL. Another is Greg Ewing's ideas offered in the "Ideas towards GIL removal" thread in the python-ideas list. My personal favorite, because it doesn't require a complete re-design of the CPython runtime, is better abstractions. I was skeptical at first, but in reading the documentation, installing, testing, and monkeying around with the processing package by Richard Oudkerk, I do think that it has the proper level of abstraction. Like thread programming it has its quirks, but it seems that one should be able to apply much of their experience with threads to the processing module (as long as they rely on explicitly shared objects for communication). If you are used to using threads, give the processing package a try. You may be as pleasantly surprised as I was. Note that it would take some more work to get it to work with passing sockets to another process, but that has been done before (I have code that others have written if anyone is curious). - Josiah From martin at v.loewis.de Sun May 6 09:47:12 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 06 May 2007 09:47:12 +0200 Subject: [Python-3000] PEP 3112 Message-ID: <463D8800.1010906@v.loewis.de> I just read PEP 3112, and I believe it contains a flaw/underspecification. It says # Each shortstringchar or longstringchar must be a character between 1 # and 127 inclusive, regardless of any encoding declaration [2] in the # source file. What does that mean? In particular, what is "a character between 1 and 127"? Assuming this refers to ordinal values in some encoding: what encoding? It's particularly puzzling that it says "regardless of any encoding declaration of the source file". I fear (but hope that I'm wrong) that this was meant to mean "use the bytes as they are stored on disk in the source file". If so: is the attached file valid Python? In case your editor can't render it: it reads #! -*- coding: iso-2022-jp -*- a = b"?????" But if you look at the file with a hex editor, you see it contains only bytes between 1 and 127. I would hope that this code is indeed ill-formed (i.e. that the byte representation on disk is irrelevant, and only the Unicode ordinals of the source characters matter) If so, can the specification please be updated to clarify that 1. in Grammar changes: Each shortstringchar or longstringchar must be a character whose Unicode ordinal value is between 1 and 127 inclusive. 2. in Semantics: The bytes in the new object are obtained as if encoding a string literal with "iso-8859-1" Regards, Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: a.py Type: text/x-python Size: 55 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070506/c0269ce4/attachment.py From martin at v.loewis.de Sun May 6 10:20:02 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 06 May 2007 10:20:02 +0200 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <20070504143759.BIA64211@ms09.lnh.mail.rcn.net> References: <20070504143759.BIA64211@ms09.lnh.mail.rcn.net> Message-ID: <463D8FB2.8050900@v.loewis.de> > Can I please press the button for a few days until I can offer a useful starting point. Socially, this is the point of the PEP process in the first place: the PEP author is supposed to collect community feedback in the PEP, and address it as necessary. People won't stop discussing if the PEP author is away, but eventually, discussion will die off, and restart when a new version of the PEP is published. Of course, at that time, people will have their bias when the next version of the PEP comes, and you can do nothing about that. Procedurally, there is a problem that this still isn't an officially-posted PEP, even though it's already several days past the deadline. OTOH, it's listed in the PEP parade. Still, I would like to see a posted PEP rather sooner than later. Defending the deadline will be necessary in the future, and that will become more difficult (on grounds of fairness) if some PEPs get accepted that had their first appearance on python.org/peps/ way after the deadline. Regards, Martin From talin at acm.org Sun May 6 10:34:14 2007 From: talin at acm.org (Talin) Date: Sun, 06 May 2007 01:34:14 -0700 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <463D8FB2.8050900@v.loewis.de> References: <20070504143759.BIA64211@ms09.lnh.mail.rcn.net> <463D8FB2.8050900@v.loewis.de> Message-ID: <463D9306.1040109@acm.org> Martin v. L?wis wrote: > Procedurally, there is a problem that this still isn't an > officially-posted PEP, even though it's already several days > past the deadline. OTOH, it's listed in the PEP parade. Still, > I would like to see a posted PEP rather sooner than later. > Defending the deadline will be necessary in the future, and > that will become more difficult (on grounds of fairness) if > some PEPs get accepted that had their first appearance on > python.org/peps/ way after the deadline. My vote would be to allow those people who have "reserved a spot" for a PEP before the deadline to be allowed to proceed, even if they didn't have an actual PEP in hand by that date. So in other words, the rule at this point should be "no new *topics* for 3.0". I would also say that a real PEP should follow within a few weeks, and if not then I'd say go ahead and disqualify the PEP - i.e. you lose your "reserved" spot if you don't come up with an actual document within a reasonable time frame. -- Talin From hasan.diwan at gmail.com Sun May 6 10:44:54 2007 From: hasan.diwan at gmail.com (Hasan Diwan) Date: Sun, 6 May 2007 01:44:54 -0700 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> Message-ID: <2cda2fc90705060144p1d621fb0w835c5b32ba999d65@mail.gmail.com> On 01/05/07, Raymond Hettinger wrote: > > PEP: Eliminating __del__ +1 -- Cheers, Hasan Diwan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070506/5c01aede/attachment.html From greg.ewing at canterbury.ac.nz Sun May 6 11:00:45 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 06 May 2007 21:00:45 +1200 Subject: [Python-3000] the future of the GIL In-Reply-To: <20070505181324.649A.JCARLSON@uci.edu> References: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> <20070505181324.649A.JCARLSON@uci.edu> Message-ID: <463D993D.2020107@canterbury.ac.nz> Josiah Carlson wrote: > There are many solutions to handling the scaling of Python on multicore > processors, only one of which is killing the GIL. Another is Greg > Ewing's ideas offered in the "Ideas towards GIL removal" thread in the > python-ideas list. Yeah, except I think only one of those would actually work (the "permanent objects" idea). The "thread-local refcount" idea seems to have at least one fatal flaw. I'm now more interested in the IBM "Recycler" idea that was mentioned. If I get a spare moment, I might have a go at implementing a "Repycler" by means of suitable redefinitions of Py_INCREF and Py_DECREF. -- Greg From baptiste13 at altern.org Sat May 5 13:07:23 2007 From: baptiste13 at altern.org (Baptiste Carvello) Date: Sat, 05 May 2007 13:07:23 +0200 Subject: [Python-3000] PEP: Supporting Non-ASCII Identifiers In-Reply-To: <463BBB0B.40703@v.loewis.de> References: <46371BD2.7050303@v.loewis.de> <463BBB0B.40703@v.loewis.de> Message-ID: <463C656B.9070200@altern.org> Martin v. L?wis a ?crit : >> If this is to ever happen, it should be only accessible through a command-line >> option to python. That way we make sure people are aware that they are making >> their code incompatible with the larger world. > > In what way will the source code be incompatible with the larger world? > > Martin > > !DSPAM:463be8e2237561355422449! > I mean incompatible from a maintenance point of view. Imagine your employer buys some chinese company (or some chinese company decides to open source its software), and you end up maintaining code where identifiers are each one chinese character... Maybe this can be solved easily with a proper IDE, though. As a user of open source software, I would also hate to open the source file in search for a bug, only to find out I can't even recognise the identifiers from one another. I'm sure big projects will have guidelines, but in my field (physics), a lot of code is written by people with little programming background. For this reason, I think using this feature should be a conscious decision at the project level, and not just one developper finding out the "cool new feature" and starting to use it in his code without much thinking about the consequences. Cheers, Baptiste P.S.: I do believe this feature is nice in some cases, for example when teaching programming to children. From tomerfiliba at gmail.com Sun May 6 19:03:43 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Sun, 6 May 2007 19:03:43 +0200 Subject: [Python-3000] comments Message-ID: <1d85506f0705061003y51d62ddcm7a0ea91221a613c6@mail.gmail.com> i finished reading almost all of the new peps, so to prevent cluttering i'll post all my comments in a single message. 3130 (Access to Current Module/Class/Function) ------------------------------------------------ why make them keywords? they could as well be builtin functions, like globals() and locals(). i.e., getmodule(), getclass(), and getfunction(). these functions will just climb up the stack frames until they find what you're asking for. also -- the class object is constructed only AFTER the code of the class has finished executing, meaning getclass() or __thisclass__ will not work at class level. so the class mechanism needs to be changed as well. 3119 (Introducing Abstract Base Classes) 3141 (A Type Hierarchy for Numbers) ------------------------------------------------ these two are very closely related, so i'll treat them as one. first, being able to override isinstance and issubclass is a great addition. it would make proxies much more transparent, and i could remove half of the black magic code from my RPC lib. other than that -- it would be horrible -- as i'll now explain. first, about the algebraic structures, such as fields, rings and what not -- not all of us are mathematicians. when john doe wants to write HisLeetNumber, i doubt he'll be able understand all of the subtle differences. adding two numbers does not require one to take Algebra 101. second -- i have stressed that before but i hope this time it may sound more convincing -- a type hierarchy is a foolish concept that is not strong enough to convey *all* of the constraints one may want to express, while being very rigid and *static*. PJE's proposal seems the only suitable one, imho. sure, duck typing by itself is not powerful enough to allow constraints and adaptation -- but a new type hierarchy is not gonna solve these issues. to start with, it's python after all, not some statically compiled type-checking langauge. i can still derive from Set and change the signature of a method (def __len__(self, foobar)), and break everything -- even though isinstance would approve. this may happen because of a "malicious" coder or just by a blunt user. so i hope this settles the case for "type safety". if you want to be static, use java. what you DO want is a way to distinguish between objects that "look similar" to others. for instance, sequences and mappings both "look similar" -- having __getitem__, __len__, __contains__, and so on. what you want is a way to say "this class is a sequence". you can do that by inheriting from some BaseSequence, but sooner or later, you'll end up with classes that derive from 10 different bases, and maintaining such a class hierarchy will become very time consuming and bug-prone. besides, imagine that someone wrote his own sequence class, which does not inherit from BaseSequence, but is otherwise a perfectly good sequence. still -- you will not be able to use it, as isinstance checks will fail. manually patching the MRO is impossible, and so you have to start finding workarounds. the solution, imo, would be in the form of contracts. the contract is just a class that defines the interface, and documents how it should behave. for instance, whether __add__ is commutative, etc. by itself is may sound just like abstract base classes, but the difference is your class won't inherit from them -- rather it would state it *conforms* with them. class MappingContract: implements(ContainerContract) def __getitem__(self, key): """if key is not found, raises KeyError""" def get(self, key, default = None): """returns self[key] if this does not raise KeyError, and the default value if it does""" def __contains__(self, key): """tests if the key exists in the mapping""" class LeetDict: implements(MappingContract) implements(SomeOtherContract) def ... ld = LeetDict() isimplementing(ld, MappingContract) # True isimplementing(ld, ContainerContract) # True isimplementing(ld, SequenceContract) # False this way, you'll never have conflicting base classes/metaclasses, and still be able to express any functionality that you'll ever want. again, with ABCs, classes would grow very large inheritance trees, that at some point are bound to conflict/collide. moreover, contracts are more "declarative". LeetDict declares it complies to some contract, rather than forcing it to have statically inherited that contract as an ABC. we can, if the need arises, patch a third-party class by declaring it complies with some contract, which is in fact unknown to the third-party class. this approach is also more extensible than ABCs: * a metaclass/class decorator can be used to check at class creation time that all of the contracts are satisfied, etc. * the contract may be any object. even just a big string that describes the contract textually * but it may also be possible to describe complex requirements expressively with decorators; for example: class Number: @commutative @associative def __add__(self, other): "returns self + other" allowing you to specify individual "properties" to each member of the contract, so you don't have to know about fields and rings just to implement an associative operation. still, the contracts approach has no trouble tackling the suggested Fields/Rings/Monads classification, should one desire to. 3129, 3127, 3177 ------------------------------------------------ as for 3129 (Class Decorators) and 3127 (Integer Literal Support and Syntax) -- it's about time we have these. and btw, the status of pep 3117 ought to be changed to 'accepted'... it would have more impact that way :) -tomer From jimjjewett at gmail.com Sun May 6 19:33:52 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 6 May 2007 13:33:52 -0400 Subject: [Python-3000] comments In-Reply-To: <1d85506f0705061003y51d62ddcm7a0ea91221a613c6@mail.gmail.com> References: <1d85506f0705061003y51d62ddcm7a0ea91221a613c6@mail.gmail.com> Message-ID: On 5/6/07, tomer filiba wrote: > 3130 (Access to Current Module/Class/Function) > ------------------------------------------------ > why make them keywords? they could as well be builtin functions, > like globals() and locals(). i.e., getmodule(), getclass(), and > getfunction(). these functions will just climb up the stack frames > until they find what you're asking for. Because I couldn't figure out how to do it after compile-time. > also -- the class object is constructed only AFTER the code > of the class has finished executing, meaning getclass() > or __thisclass__ will not work at class level. Correct, but it would work within methods of the class. Functions also don't exist while still being defined, and modules aren't fully usable while being defined. -jJ From rasky at develer.com Sun May 6 21:10:40 2007 From: rasky at develer.com (Giovanni Bajo) Date: Sun, 06 May 2007 21:10:40 +0200 Subject: [Python-3000] the future of the GIL In-Reply-To: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> References: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> Message-ID: On 05/05/2007 15.29, tomer filiba wrote: > however, running a threaded python script over an 8-core > machine, where you can utilize at most 12.5% of the horsepower, > seems like too large a sacrifice to me. You seem to believe that the only way to parallelize your programs is to use threads. IMHO, threads is just the most common and absolutely the worst, under many points of views. -- Giovanni Bajo From talin at acm.org Sun May 6 23:19:01 2007 From: talin at acm.org (Talin) Date: Sun, 06 May 2007 14:19:01 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: References: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> Message-ID: <463E4645.5000503@acm.org> Giovanni Bajo wrote: > On 05/05/2007 15.29, tomer filiba wrote: > >> however, running a threaded python script over an 8-core >> machine, where you can utilize at most 12.5% of the horsepower, >> seems like too large a sacrifice to me. > > You seem to believe that the only way to parallelize your programs is to use > threads. IMHO, threads is just the most common and absolutely the worst, under > many points of views. I think it's a case of wanting the most general mechanism for doing parallel computation. Any algorithm that can be efficiently parallelized using processes can also be done with threads (assuming that the infrastructure for threading is there), but the converse is not true. -- Talin From brett at python.org Sun May 6 23:50:09 2007 From: brett at python.org (Brett Cannon) Date: Sun, 6 May 2007 14:50:09 -0700 Subject: [Python-3000] Dealing with timestamp issues for rebuiling AST using Parser/asdl_c.py Message-ID: I am sending this email to make sure people are aware of a possible build problem they might come up against that is unique to Python 3.0 and how to deal with it. I decided to do a ``make distclean`` and rebuild my p3yk checkout. But I came across the error of:: File "./Parser/asdl_c.py", line 744 print(auto_gen_msg, file=f) Oops. Turns out the Makefile executes 'python' which is 2.4.3 on my machine; joys of bootstrapping the build process with Python. After touching Include/Python-ast.h and Parser/Python-ast.h I got p3yk to build. But to make sure I had the newest auto-generated files I touched Parser/asdl.py and got the same error. Oops again. So, I took my clean build of Py3K in my checkout and basically did what the Makefile wanted to do, just with the proper Python version:: ./python.exe Parser/asdl_c.py -h Include Parser/Python.asdl ./python.exe Parser/asdl_c.py -c Python Parser/Python.asdl This all came about because I am reviewing Tony Lownds' patch for PEP 3113 which touches both the grammar and the AST. I had to run the above statements in the separate checkout I have for this patch using my pristine copy of the p3yk branch. Then, after a ``make clean`` the thing built properly. Hopefully this won't be a problem with source distributions of Python or else there might be a flurry of emails and such about this error when people really start trying to use Python 3.0. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070506/10a05316/attachment.htm From greg.ewing at canterbury.ac.nz Mon May 7 03:40:10 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 07 May 2007 13:40:10 +1200 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <463E29A4.5010003@develer.com> References: <001d01c78bc2$b9367a60$f001a8c0@RaymondLaptop1> <5.1.1.6.0.20070501120218.02d31250@sparrow.telecommunity.com> <463AB1DB.5010308@canterbury.ac.nz> <463B02EB.6060006@develer.com> <463D175E.1000201@canterbury.ac.nz> <463E29A4.5010003@develer.com> Message-ID: <463E837A.3010905@canterbury.ac.nz> Giovanni Bajo wrote: > What I really meant was: > > self.__wr = weakref.ref(self, ...) Okay, that looks better. But I'm not sure what will happen if the holder becomes part of a cycle. If the GC picks the holder as the object to clear to break the cycle, then the weakref will be deallocated before the holder, and the callback won't be called. So it doesn't seem to be an improvement over __del__. -- Greg From jcarlson at uci.edu Mon May 7 07:36:35 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 06 May 2007 22:36:35 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: <463E4645.5000503@acm.org> References: <463E4645.5000503@acm.org> Message-ID: <20070506222840.25B2.JCARLSON@uci.edu> Talin wrote: > Giovanni Bajo wrote: > > On 05/05/2007 15.29, tomer filiba wrote: > > > >> however, running a threaded python script over an 8-core > >> machine, where you can utilize at most 12.5% of the horsepower, > >> seems like too large a sacrifice to me. > > > > You seem to believe that the only way to parallelize your programs is to use > > threads. IMHO, threads is just the most common and absolutely the worst, under > > many points of views. > > I think it's a case of wanting the most general mechanism for doing > parallel computation. Any algorithm that can be efficiently parallelized > using processes can also be done with threads (assuming that the > infrastructure for threading is there), but the converse is not true. The proposals to remove the GIL have been under the assumption that shared memory processing using multiple threads is desired. They also presume that there will be some sort of locking mechanism on a per-object basis so that objects won't be clobbered. By going multi-process rather than multi-threaded, one generally removes shared memory from the equasion. Note that this has the same effect as using queues with threads, which is generally seen as the only way of making threads "easy". If one *needs* shared memory, we can certainly create an mmap-based shared memory subsystem with fine-grained object locking, or emulate it via a server process as the processing package has done. Seriously, give the processing package a try. It's much faster than one would expect. - Josiah From martin at v.loewis.de Mon May 7 07:35:41 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 07 May 2007 07:35:41 +0200 Subject: [Python-3000] Dealing with timestamp issues for rebuiling AST using Parser/asdl_c.py In-Reply-To: References: Message-ID: <463EBAAD.6070102@v.loewis.de> > File "./Parser/asdl_c.py", line 744 > print(auto_gen_msg, file=f) I think asdl_c.py should be formulated in a way that is compatible with 2.x. It already uses f.write in many places; the few remaining ones should be updated. Regards, Martin From tdelaney at avaya.com Mon May 7 00:34:52 2007 From: tdelaney at avaya.com (Delaney, Timothy (Tim)) Date: Mon, 7 May 2007 08:34:52 +1000 Subject: [Python-3000] [Python-Dev] Pre-pre PEP for 'super' keyword Message-ID: <2773CAC687FD5F4689F526998C7E4E5FF1ED82@au3010avexu1.global.avaya.com> Steve Holden wrote: > Tim Delaney wrote: >> BTW, one of my test cases involves multiple super calls in the same >> method - there is a *very* large performance improvement by >> instantiating it once. >> > And how does speed deteriorate for methods with no uses of super at > all (which will, I suspect, be in the majority)? Zero - in those cases, no super instance is instantiated. There is a small one-time cost when the class is constructed in the reference implementation (due to the need to parse the bytecode to determine if if 'super' is used) but in the final implementation that information will be gathered during compilation. Tim Delaney From nnorwitz at gmail.com Mon May 7 08:10:39 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Sun, 6 May 2007 23:10:39 -0700 Subject: [Python-3000] Dealing with timestamp issues for rebuiling AST using Parser/asdl_c.py In-Reply-To: <463EBAAD.6070102@v.loewis.de> References: <463EBAAD.6070102@v.loewis.de> Message-ID: On 5/6/07, "Martin v. L?wis" wrote: > > File "./Parser/asdl_c.py", line 744 > > print(auto_gen_msg, file=f) > > I think asdl_c.py should be formulated in a way > that is compatible with 2.x. It already uses > f.write in many places; the few remaining ones > should be updated. This is the case since about 6 minutes before you sent your message. :-) Date: Mon May 7 07:29:18 2007 New Revision: 55162 Modified: python/branches/p3yk/Parser/asdl.py python/branches/p3yk/Parser/asdl_c.py python/branches/p3yk/Parser/spark.py Log: Get asdl code gen working with Python 2.3. Should continue to work with 3.0 n From nnorwitz at gmail.com Mon May 7 09:05:14 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Mon, 7 May 2007 00:05:14 -0700 Subject: [Python-3000] Can someone please make py3k* checkins go to the python-3000-checkins mailing list? In-Reply-To: References: Message-ID: On 5/4/07, Guido van Rossum wrote: > I don't know how the filters for checkin emails are set up, but this > seems wrong: mail related to the p3yk branch goes to > python-3000-checkins, but mail related to the py3k-unistr branch goes > to python-checkins. There are a bunch of branches of relevance to py3k > now; these should all go to the python-3000-checkins list. I suggest > to filter on branches that start with either py3k or with p3yk. I've done that (more or less). Here is the regex. Please (re)name your branches appropriately. ^python/branches/(p3yk/|py3k).* n From nnorwitz at gmail.com Mon May 7 10:04:22 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Mon, 7 May 2007 01:04:22 -0700 Subject: [Python-3000] failing tests Message-ID: There are 3* failing tests: test_compiler test_doctest test_transformer * plus a few more when running on a 64-bit platform These failures occurred before and after xrange checkin. Do other people see these failures? Any ideas when they started? The doctest failures are due to no space at the end of the line (print behavior change). Not sure what to do about that now that we prevent blanks at the end of lines from being checked in. :-) n From tomerfiliba at gmail.com Mon May 7 13:08:04 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 7 May 2007 13:08:04 +0200 Subject: [Python-3000] new io (pep 3116) Message-ID: <1d85506f0705070408h6b21540ej54888e97ad6854dc@mail.gmail.com> my original idea about the new i/o foundation was more elaborate than the pep, but i have to admit the pep is more feasible and compact. some comments though: writeline ----------------------------- TextIOBase should grow a writeline() method, to be symmetrical with readline(). the reason is simple -- the newline char is configurable in the constructor, so it's not necessarily "\n". so instead of adding the configurable newline char manually, the user should call writeline() which would append the appropriate newline automatically. sockets ----------------------------- iirc, SocketIO is a layer that wraps an underlying socket object. that's a good distinction -- to separate the underlying socket from the RawIO interface -- but don't forget socket objects, by themselves, need a cleanup too. for instance, there's no point in UDP sockets having listen(), or send() or getpeername() -- with UDP you only ever use sendto and recvfrom. on the other hand, TCP sockets make no use of sendto(). and even with TCP sockets, listeners never use send() or recv(), while connected sockets never use listen() or connect(). moreover, the current socket interface simply mimics the BSD interface. setsockopt, getsockopt, et al, are very unpythonic by nature -- the ought to be exposed as properties or methods of the socket. all in all, the current socket model is very low level with no high level design. some time ago i was working on a sketch for a new socket module (called sock2) which had a clear distinction between connected sockets, listener sockets and datagram sockets. each protocol was implemented as a subclass of one of these base classes, and exposed only the relevant methods. socket options were added as properties and methods, and a new DNS module was added for dns-related queries. you can see it here -- http://sebulba.wikispaces.com/project+sock2 i know it's late already, but i can write a PEP over the weekend, or if someone else wants to carry on with the idea, that's fine with me. non-blocking IO ----------------------------- the pep says "In order to put an object in object in non-blocking mode, the user must extract the fileno and do it by hand." but i think it would only lead to trouble. as the entire IO library is being rethought from the grounds up, non-blocking IO should be taken into account. non-blocking IO depends greatly on the platform -- and this is exactly why a cross-platform language should standardized that as part of the new IO layer. saying "let's keep it for later" would only require more work at some later stage. it's true that SyncIO and AsyncIO don't mingle well with the same interfaces. that's why i think they should be two distinct classes. the class hierarchy should be something like: class RawIO: def fileno() def close() class SyncIO(RawIO): def read(count) def write(data) class AsyncIO(RawIO): def read(count, timeout) def write(data, timeout) def bgread(count, callback) def bgwrite(data, callback) or something similar. there's no point to add both sync and async operations to the RawIO level -- it just won't work together. we need to keep the two distinct. buffering should only support SyncIO -- i also don't see much point in having buffered async IO. it's mostly used for sockets and devices, which are most likely to work with binary data structures rather than text, and if you *require* non-blocking mode, buffering will only get in your way. if you really want a buffered AsyncIO stream, you could write a compatibility layer that makes the underlying AsyncIO object appear synchronous. records ----------------------------- another addition to the PEP that seems useful to me would be a RecordIOBase/Wrapper. records are fixed-length binary data structures, defined as format strings of the struct-module. class RecordIOWrapper: def __init__(self, buffer, format) def read(self) -> tuple of fields def write(self, *fields) another cool feature i can think of is "multiplexing", or working with the same underlying stream in different ways by having multiple wrappers over it. for example, to implement a type-length-value stream, which is very common in communication protocols, one could do something like class MultiplexedIO: def __init__(self, *streams): self.streams = itertools.cycle(streams) def read(self, *args): """read from the next stream each time it's called""" return self.streams.next().read(*args) sock = BufferedRW(SocketIO(...)) tlrec = Record(sock, "!BL") tlv = MultiplexedIO(tvrec, sock) type, length = tlv.read() value = tlv.read(length) you can also build higher-level state machines with that -- for instance, if the type was "int", the next call to read() would decode the value as an integer, and so on. you could write parsers right on top of the IO layer. just an idea. i'm not sure if that's proper design or just a silly idea, but we'll leave that to the programmer. -tomer From tomerfiliba at gmail.com Mon May 7 13:21:39 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 7 May 2007 13:21:39 +0200 Subject: [Python-3000] the future of the GIL In-Reply-To: <463D280F.6070101@acm.org> References: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> <463D280F.6070101@acm.org> Message-ID: <1d85506f0705070421n1d083f47ge0bcfca2a27af8f9@mail.gmail.com> [Talin] > Note that Jython and IronPython don't have the same restrictions in this > regard as CPython. Both VMs are able to run in multiprocessing > environments. (I don't know whether or not Jython/IronPython even have a > GIL or not.) they don't. they rely on jvm/clr for GC, and probably per-thread locking when they touch global data. [Giovanni Bajo] > You seem to believe that the only way to parallelize your programs is to use > threads. IMHO, threads is just the most common and absolutely the worst, under > many points of views. not at all. personally i hate threads, but there are many place where you can use them properly to distribute workload -- without mutual dependencies or shard state. this makes them essentially like light-weight processes, using background workers and queues, etc., only without the overhead of multiple processes. there could be a stdlib threading module that would provide you with all kinds of queues, schedulers, locks, and decorators, so you wouldn't have to manually lock things every time. -tomer From daniel at stutzbachenterprises.com Mon May 7 15:32:06 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 7 May 2007 08:32:06 -0500 Subject: [Python-3000] new io (pep 3116) In-Reply-To: <1d85506f0705070408h6b21540ej54888e97ad6854dc@mail.gmail.com> References: <1d85506f0705070408h6b21540ej54888e97ad6854dc@mail.gmail.com> Message-ID: On 5/7/07, tomer filiba wrote: > for instance, there's no point in UDP sockets having listen(), or send() > or getpeername() -- with UDP you only ever use sendto and recvfrom. > on the other hand, Actually, you can connect() UDP sockets, and then you can use send(), recv(), and getpeername(). > TCP sockets make no use of sendto(). and even with > TCP sockets, listeners never use send() or recv(), while connected > sockets never use listen() or connect(). Agreed. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From guido at python.org Mon May 7 16:28:10 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 07:28:10 -0700 Subject: [Python-3000] failing tests In-Reply-To: References: Message-ID: Thanks for checking in xrange!!!!! Woot! test_compiler and test_transformer are waiting for someone to clean up the compiler package (I forget what it doesn't support, perhapes only nonlocal needs to be added.) Looks like you diagnosed the doctest failure correctly. This is probably because, when print changed into print(), lines ending in spaces are generated in some cases: # Py 2 code, writes "42\n" print 42, print # Py3k automatically translated, writes "42 \n" print(42, end=" ") print() I'm afraid we'll have to track down the places where this affects the doctest and fix them. (Fixing the doctest is possible too, though less elegant: just add \n\ to the end of the line.) --Guido On 5/7/07, Neal Norwitz wrote: > There are 3* failing tests: > test_compiler test_doctest test_transformer > * plus a few more when running on a 64-bit platform > > These failures occurred before and after xrange checkin. > > Do other people see these failures? Any ideas when they started? > > The doctest failures are due to no space at the end of the line (print > behavior change). Not sure what to do about that now that we prevent > blanks at the end of lines from being checked in. :-) > > n > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Mon May 7 17:47:00 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 7 May 2007 11:47:00 -0400 Subject: [Python-3000] updated PEP3126: Remove Implicit String Concatenation Message-ID: Rewritten -- please tell me if there are any concerns I have missed. And of course, please tell me if you have a suggestion for the open issue -- how to better support external internationalization tools, or at least xgettext in particular. -jJ ----------------------------------- PEP: 3126 Title: Remove Implicit String Concatenation Version: $Revision$ Last-Modified: $Date$ Author: Jim J. Jewett , Raymond D. Hettinger Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 29-Apr-2007 Post-History: 29-Apr-2007, 30-Apr-2007, 07-May-2007 Abstract ======== Python inherited many of its parsing rules from C. While this has been generally useful, there are some individual rules which are less useful for python, and should be eliminated. This PEP proposes to eliminate implicit string concatenation based only on the adjacency of literals. Instead of:: "abc" "def" == "abcdef" authors will need to be explicit, and either add the strings:: "abc" + "def" == "abcdef" or join them:: "".join(["abc", "def"]) == "abcdef" Motivation ========== One goal for Python 3000 should be to simplify the language by removing unnecessary features. Implicit string concatenation should be dropped in favor of existing techniques. This will simplify the grammar and simplify a user's mental picture of Python. The latter is important for letting the language "fit in your head". A large group of current users do not even know about implicit concatenation. Of those who do know about it, a large portion never use it or habitually avoid it. Of those who both know about it and use it, very few could state with confidence the implicit operator precedence and under what circumstances it is computed when the definition is compiled versus when it is run. History or Future ----------------- Many Python parsing rules are intentionally compatible with C. This is a useful default, but Special Cases need to be justified based on their utility in Python. We should no longer assume that python programmers will also be familiar with C, so compatibility between languages should be treated as a tie-breaker, rather than a justification. In C, implicit concatenation is the only way to join strings without using a (run-time) function call to store into a variable. In Python, the strings can be joined (and still recognized as immutable) using more standard Python idioms, such ``+`` or ``"".join``. Problem ------- Implicit String concatentation leads to tuples and lists which are shorter than they appear; this is turn can lead to confusing, or even silent, errors. For example, given a function which accepts several parameters, but offers a default value for some of them:: def f(fmt, *args): print fmt % args This looks like a valid call, but isn't:: >>> f("User %s got a message %s", "Bob" "Time for dinner") Traceback (most recent call last): File "", line 2, in "Bob" File "", line 2, in f print fmt % args TypeError: not enough arguments for format string Calls to this function can silently do the wrong thing:: def g(arg1, arg2=None): ... # silently transformed into the possibly very different # g("arg1 on this linearg2 on this line", None) g("arg1 on this line" "arg2 on this line") To quote Jason Orendorff [#Orendorff] Oh. I just realized this happens a lot out here. Where I work, we use scons, and each SConscript has a long list of filenames:: sourceFiles = [ 'foo.c' 'bar.c', #...many lines omitted... 'q1000x.c'] It's a common mistake to leave off a comma, and then scons complains that it can't find 'foo.cbar.c'. This is pretty bewildering behavior even if you *are* a Python programmer, and not everyone here is. Solution ======== In Python, strings are objects and they support the __add__ operator, so it is possible to write:: "abc" + "def" Because these are literals, this addition can still be optimized away by the compiler; the CPython compiler already does so. [#rcn-constantfold]_ Other existing alternatives include multiline (triple-quoted) strings, and the join method:: """This string extends across multiple lines, but you may want to use something like Textwrap.dedent to clear out the leading spaces and/or reformat. """ >>> "".join(["empty", "string", "joiner"]) == "emptystringjoiner" True >>> " ".join(["space", "string", "joiner"]) == "space string joiner" >>> "\n".join(["multiple", "lines"]) == "multiple\nlines" == ( """multiple lines""") True Concerns ======== Operator Precedence ------------------- Guido indicated [#rcn-constantfold]_ that this change should be handled by PEP, because there were a few edge cases with other string operators, such as the %. (Assuming that str % stays -- it may be eliminated in favor of PEP 3101 -- Advanced String Formatting. [#PEP3101]_ [#elimpercent]_) The resolution is to use parentheses to enforce precedence -- the same solution that can be used today:: # Clearest, works today, continues to work, optimization is # already possible. ("abc %s def" + "ghi") % var # Already works today; precedence makes the optimization more # difficult to recognize, but does not change the semantics. "abc" + "def %s ghi" % var as opposed to:: # Already fails because modulus (%) is higher precedence than # addition (+) ("abc %s def" + "ghi" % var) # Works today only because adjacency is higher precedence than # modulus. This will no longer be available. "abc %s" "def" % var # So the 2-to-3 translator can automically replace it with the # (already valid): ("abc %s" + "def") % var Long Commands ------------- ... build up (what I consider to be) readable SQL queries [#skipSQL]_:: rows = self.executesql("select cities.city, state, country" " from cities, venues, events, addresses" " where cities.city like %s" " and events.active = 1" " and venues.address = addresses.id" " and addresses.city = cities.id" " and events.venue = venues.id", (city,)) Alternatives again include triple-quoted strings, ``+``, and ``.join``:: query="""select cities.city, state, country from cities, venues, events, addresses where cities.city like %s and events.active = 1" and venues.address = addresses.id and addresses.city = cities.id and events.venue = venues.id""" query=( "select cities.city, state, country" + " from cities, venues, events, addresses" + " where cities.city like %s" + " and events.active = 1" + " and venues.address = addresses.id" + " and addresses.city = cities.id" + " and events.venue = venues.id" ) query="\n".join(["select cities.city, state, country", " from cities, venues, events, addresses", " where cities.city like %s", " and events.active = 1", " and venues.address = addresses.id", " and addresses.city = cities.id", " and events.venue = venues.id"]) # And yes, you *could* inline any of the above querystrings # the same way the original was inlined. rows = self.executesql(query, (city,)) Regular Expressions ------------------- Complex regular expressions are sometimes stated in terms of several implicitly concatenated strings with each regex component on a different line and followed by a comment. The plus operator can be inserted here but it does make the regex harder to read. One alternative is to use the re.VERBOSE option. Another alternative is to build-up the regex with a series of += lines:: # Existing idiom which relies on implicit concatenation r = ('a{20}' # Twenty A's 'b{5}' # Followed by Five B's ) # Mechanical replacement r = ('a{20}' +# Twenty A's 'b{5}' # Followed by Five B's ) # already works today r = '''a{20} # Twenty A's b{5} # Followed by Five B's ''' # Compiled with the re.VERBOSE flag # already works today r = 'a{20}' # Twenty A's r += 'b{5}' # Followed by Five B's Internationalization -------------------- Some internationalization tools -- notably xgettext -- have already been special-cased for implicit concatenation, but not for Python's explicit concatenation. [#barryi8]_ These tools will fail to extract the (already legal):: _("some string" + " and more of it") but often have a special case for:: _("some string" " and more of it") It should also be possible to just use an overly long line (xgettext limits messages to 2048 characters [#xgettext2048]_, which is less than Python's enforced limit) or triple-quoted strings, but these solutions sacrifice some readability in the code:: # Lines over a certain length are unpleasant. _("some string and more of it") # Changing whitespace is not ideal. _("""Some string and more of it""") _("""Some string and more of it""") _("Some string \ and more of it") I do not see a good short-term resolution for this. Transition ========== The proposed new constructs are already legal in current Python, and can be used immediately. The 2 to 3 translator can be made to mechanically change:: "str1" "str2" ("line1" #comment "line2") into:: ("str1" + "str2") ("line1" +#comments "line2") If users want to use one of the other idioms, they can; as these idioms are all already legal in python 2, the edits can be made to the original source, rather than patching up the translator. Open Issues =========== Is there a better way to support external text extraction tools, or at least ``xgettext`` [#gettext]_ in particular? References ========== .. [#Orendorff] Implicit String Concatenation, Orendorff http://mail.python.org/pipermail/python-ideas/2007-April/000397.html .. [#rcn-constantfold] Reminder: Py3k PEPs due by April, Hettinger, van Rossum http://mail.python.org/pipermail/python-3000/2007-April/006563.html .. [#PEP3101] PEP 3101, Advanced String Formatting, Talin http://www.python.org/peps/pep-3101.html .. [#elimpercent] ps to question Re: Need help completing ABC pep, van Rossum http://mail.python.org/pipermail/python-3000/2007-April/006737.html .. [#skipSQL] (email Subject) PEP 30XZ: Simplified Parsing, Skip, http://mail.python.org/pipermail/python-3000/2007-May/007261.html .. [#barryi8] (email Subject) PEP 30XZ: Simplified Parsing http://mail.python.org/pipermail/python-3000/2007-May/007305.html .. [#gettext] GNU gettext manual http://www.gnu.org/software/gettext/ .. [#xgettext2048] Unix man page for xgettext -- Notes section http://www.scit.wlv.ac.uk/cgi-bin/mansec?1+xgettext Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From antoine.pitrou at wengo.com Mon May 7 17:27:41 2007 From: antoine.pitrou at wengo.com (Antoine Pitrou) Date: Mon, 07 May 2007 17:27:41 +0200 Subject: [Python-3000] PEP: Eliminate __del__ Message-ID: <1178551661.8251.16.camel@antoine-ubuntu> FWIW and in light of the thread on removing __del__ from the language, I just posted Yet Another Recipe for automatic finalization: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/519621 It allows writing a finalizer as a single __finalize__ method, at the cost of explicitly calling an enable_finalizer() method with the list of attributes to keep alive on the "ghost object". Antoine. From ncoghlan at gmail.com Mon May 7 18:17:57 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 08 May 2007 02:17:57 +1000 Subject: [Python-3000] failing tests In-Reply-To: References: Message-ID: <463F5135.2090007@gmail.com> Guido van Rossum wrote: > Thanks for checking in xrange!!!!! Woot! > > test_compiler and test_transformer are waiting for someone to clean up > the compiler package (I forget what it doesn't support, perhapes only > nonlocal needs to be added.) It's definitely lagging on set comprehensions as well. I'm also pretty sure those two tests broke before nonlocal was added, as they were already broken when I started helping Georg in looking at the setcomp updates. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From collinw at gmail.com Mon May 7 19:08:34 2007 From: collinw at gmail.com (Collin Winter) Date: Mon, 7 May 2007 10:08:34 -0700 Subject: [Python-3000] PEP 3129: Class Decorators Message-ID: <43aa6ff70705071008q6a33e00eq7e5073dba5fa07e@mail.gmail.com> Can I go ahead and mark PEP 3129 as "accepted"? From steven.bethard at gmail.com Mon May 7 19:21:31 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 7 May 2007 11:21:31 -0600 Subject: [Python-3000] new io (pep 3116) In-Reply-To: <1d85506f0705070408h6b21540ej54888e97ad6854dc@mail.gmail.com> References: <1d85506f0705070408h6b21540ej54888e97ad6854dc@mail.gmail.com> Message-ID: On 5/7/07, tomer filiba wrote: > some time ago i was working on a sketch for a new socket module > (called sock2) which had a clear distinction between connected sockets, > listener sockets and datagram sockets. each protocol was implemented > as a subclass of one of these base classes, and exposed only the > relevant methods. socket options were added as properties and > methods, and a new DNS module was added for dns-related queries. > > you can see it here -- http://sebulba.wikispaces.com/project+sock2 > i know it's late already, but i can write a PEP over the weekend, It's not too late for standard library PEPs, only PEPs that change the core language. Since your proposal here would presumably replace the socket module, I assume it counts as a stdlib change. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Mon May 7 19:33:07 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 10:33:07 -0700 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: <463AB2E1.2030408@canterbury.ac.nz> <463BDC9B.2030500@canterbury.ac.nz> Message-ID: On 5/4/07, Daniel Stutzbach wrote: > On 5/4/07, Greg Ewing wrote: > > I don't think that returning the type given is a goal > > that should be attempted, because it can only ever work > > for a fixed set of known types. Given an arbitrary > > sequence type, there is no way of knowing how to > > create a new instance of it with specified contents. > > For objects that support the sequence protocol, how about specifying that: > > a, *b = container_object > > must be equivalent to: > > a, b = container_object[0], container_object[1:] > > That way, b is assigned whatever container_object's getslice method > returns. A list will return a list, a tuple will return a tuple, and > widgets (or BLists...) can return whatever makes sense for them. And what do you return when it doesn't support the container protocol? Think about the use cases. It seems that *your* use case is some kind of (car, cdr) splitting known from Lisp and from functional languages (Haskell is built out of this idiom it seems from the examples). But in Python, if you want to loop over one of those things, you ought to use a for-loop; and if you really want a car/cdr split, explicitly using the syntax you show above (x[0], x[1:]) is fine. The important use case in Python for the proposed semantics is when you have a variable-length record, the first few items of which are interesting, and the rest of which is less so, but not unimportant. (If you wanted to throw the rest away, you'd just write a, b, c = x[:3] instead of a, b, c, *d = x.) It is much more convenient for this use case if the type of d is fixed by the operation, so you can count on its behavior. There's a bug in the design of filter() in Python 2 (which will be fixed in 3.0 by turning it into an iterator BTW): if the input is a tuple, the output is a tuple too, but if the input is a list *or anything else*, the output is a list. That's a totally insane signature, since it means that you can't count on the result being a list, *nor* on it being a tuple -- if you need it to be one or the other, you have to convert it to one, which is a waste of time and space. Please let's not repeat this design bug. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 7 19:42:41 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 10:42:41 -0700 Subject: [Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py )) In-Reply-To: <20070505124008.648D.JCARLSON@uci.edu> References: <20070505150035.GA16303@panix.com> <200705051334.45120.fdrake@acm.org> <20070505124008.648D.JCARLSON@uci.edu> Message-ID: [+python-3000; replies please remove python-dev] On 5/5/07, Josiah Carlson wrote: > > "Fred L. Drake, Jr." wrote: > > > > On Saturday 05 May 2007, Aahz wrote: > > > I'm with MAL and Fred on making literals immutable -- that's safe and > > > lots of newbies will need to use byte literals early in their Python > > > experience if they pick up Python to operate on network data. > > > > Yes; there are lots of places where bytes literals will be used the way str > > literals are today. buffer(b'...') might be good enough, but it seems more > > than a little idiomatic, and doesn't seem particularly readable. > > > > I'm not suggesting that /all/ literals result in constants, but bytes literals > > seem like a case where what's wanted is the value. If b'...' results in a > > new object on every reference, that's a lot of overhead for a network > > protocol implementation, where the data is just going to be written to a > > socket or concatenated with other data. An immutable bytes type would be > > very useful as a dictionary key as well, and more space-efficient than > > tuple(b'...'). > > I was saying the exact same thing last summer. See my discussion with > Martin about parsing/unmarshaling. What I expect will happen with bytes > as dictionary keys is that people will end up subclassing dictionaries > (with varying amounts of success and correctness) to do something like > the following... > > class bytesKeys(dict): > ... > def __setitem__(self, key, value): > if isinstance(key, bytes): > key = key.decode('latin-1') > else: > raise KeyError("only bytes can be used as keys") > dict.__setitem__(self, key, value) > ... > > Is it optimal? No. Would it be nice to have immtable bytes? Yes. Do > I think it will really be a problem in parsing/unmarshaling? I don't > know, but the fact that there now exists a reasonable literal syntax b'...' > rather than the previous bytes([1, 2, 3, ...]) means that we are coming > much closer to having what really is about the best way to handle this; > Python 2.x str. I don't know how this will work out yet. I'm not convinced that having both mutable and immutable bytes is the right thing to do; but I'm also not convinced of the opposite. I am slowly working on the string/unicode unification, and so far, unfortunately, it is quite daunting to get rid of 8-bit strings even at the Python level let alone at the C level. I suggest that the following exercise, to be carried out in the py3k-struni branch, might be helpful: (1) change the socket module to return bytes instead of strings (it already takes bytes, by virtue of the buffer protocol); (2) change its makefile() method so that it uses the new io.py library, in particular the SocketIO wrapper there; (3) fix up the httplib module and perhaps other similar ones. Take copious notes while doing this. Anyone up for this? I will listen! (I'd do it myself but I don't know where I'd find the time). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 7 19:45:40 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 10:45:40 -0700 Subject: [Python-3000] PEP 3112 In-Reply-To: <463D8800.1010906@v.loewis.de> References: <463D8800.1010906@v.loewis.de> Message-ID: On 5/6/07, "Martin v. L?wis" wrote: > I just read PEP 3112, and I believe it contains a > flaw/underspecification. > > It says > > # Each shortstringchar or longstringchar must be a character between 1 > # and 127 inclusive, regardless of any encoding declaration [2] in the > # source file. > > What does that mean? In particular, what is "a character between 1 and > 127"? > > Assuming this refers to ordinal values in some encoding: what encoding? > It's particularly puzzling that it says "regardless of any encoding > declaration of the source file". > > I fear (but hope that I'm wrong) that this was meant to mean "use the > bytes as they are stored on disk in the source file". If so: is the > attached file valid Python? In case your editor can't render it: it > reads > > #! -*- coding: iso-2022-jp -*- > a = b"?????" > > But if you look at the file with a hex editor, you see it contains > only bytes between 1 and 127. > > I would hope that this code is indeed ill-formed (i.e. that > the byte representation on disk is irrelevant, and only the > Unicode ordinals of the source characters matter) > > If so, can the specification please be updated to clarify that > 1. in Grammar changes: Each shortstringchar or longstringchar must > be a character whose Unicode ordinal value is between 1 and > 127 inclusive. > 2. in Semantics: The bytes in the new object are obtained as if > encoding a string literal with "iso-8859-1" Sounds like a good fix to me; I agree that bytes literals, like Unicode literals, should not vary depending on the source encoding. In step 2, can't you use "ascii" as the encoding? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 7 19:49:36 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 10:49:36 -0700 Subject: [Python-3000] comments In-Reply-To: <1d85506f0705061003y51d62ddcm7a0ea91221a613c6@mail.gmail.com> References: <1d85506f0705061003y51d62ddcm7a0ea91221a613c6@mail.gmail.com> Message-ID: On 5/6/07, tomer filiba wrote: > i finished reading almost all of the new peps, so to prevent cluttering > i'll post all my comments in a single message. Please don't do that -- it leads to multiple discussions going on in the same email thread, and that's really hard to keep track of (as I learned after posting my "PEP parade" email). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 7 19:58:45 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 10:58:45 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> References: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> Message-ID: On 5/5/07, tomer filiba wrote: > i have to admit i've been neglecting the list in the past few months, > and i don't know whether the issue i want to bring up has been > settled already. It's been settled by default -- nobody submitted a PEP to kill the GIL in time for the April 30 deadline, and I won't accept one now. > as you all may have noticed, multicore processors are becoming > more and more common in all kinds of machines, from desktops > to servers, and will surely become more prevalent with time, > as all major CPU vendors plan to ship 8-core processors > by mid-2008. > > back in the day of uniprocessor machines, having the GIL really > made life simpler and the sacrifice was negligible. > > however, running a threaded python script over an 8-core > machine, where you can utilize at most 12.5% of the horsepower, > seems like too large a sacrifice to me. > > the only way to overcome this with cpython is to Kill The GIL (TM), > and since it's a very big architectural change, it ought to happen > soon. pushing it further than version 3.0 means all library authors > would have to adapt their code twice (once to make it compatible > with 3.0, and then again to make it thread safe). Here's something I wrote recently to someone (a respected researcher) who has a specific design in mind to kill the GIL (rather than an agenda without a plan). """ Briefly, the reason why it's so hard to get rid of the GIL is that this Python implementation uses reference counting as its primary GC approach (there's a cycle-traversing GC algorithm bolted on the side, but most objects are reclaimed by refcounting). In Python, everything is an object (even small integers and characters), and many objects are conceptually immutable, allowing free sharing of objects as values between threads. There is no marking of objects as "local to a thread" or "local to a frame" -- that would be a totally alien concept. All objects have a refcount field (a long at the front of the object structure) and this sees a lot of traffic. As C doesn't have an atomic increment nor an atomic decrement-and-test, the INCREF and DECREF macros sprinkled throughout the code (many thousands of them) must be protected by some lock. Around '99 Greg Stein and Mark Hammond tried to get rid of the GIL. They removed most of the global mutable data structures, added explicit locks to the remaining ones and to individual mutable objects, and actually got the whole thing working. Unfortunately even on the system with the fastest locking primitives (Windows at the time) they measured a 2x slow-down on a single CPU due to all the extra locking operations going on. Good luck fixing this! My personal view on it is that it's not worth it. If you want to run Python on a multiprocessor, you're much better off finding a way to break the application off into multiple processes each running a single CPU-bound thread and any number of I/O-bound threads; alternatively, if you cannot resist the urge for multiple CPU-bound threads, you can use one of the Python implementations built on inherently multi-threading frameworks, i.e. Jython or IronPython. But I'd be happy to be proven wrong, if only because this certainly is a recurring heckle whenever I give a talk about Python anywhere. """ -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 7 20:07:54 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 11:07:54 -0700 Subject: [Python-3000] updated PEP3126: Remove Implicit String Concatenation In-Reply-To: References: Message-ID: Committed revision 55172. For the record, I'm more and more -1 on this (and on its companion to remove \ line continuation). These seem pretty harmless features that serve a purpose; those of us who don't like them can avoid them. --Guido On 5/7/07, Jim Jewett wrote: > Rewritten -- please tell me if there are any concerns I have missed. > > And of course, please tell me if you have a suggestion for the open > issue -- how to better support external internationalization tools, or > at least xgettext in particular. > > -jJ > > ----------------------------------- > > PEP: 3126 > Title: Remove Implicit String Concatenation > Version: $Revision$ > Last-Modified: $Date$ > Author: Jim J. Jewett , > Raymond D. Hettinger -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 7 20:10:30 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 11:10:30 -0700 Subject: [Python-3000] failing tests In-Reply-To: <463F5135.2090007@gmail.com> References: <463F5135.2090007@gmail.com> Message-ID: On 5/7/07, Nick Coghlan wrote: > Guido van Rossum wrote: > > Thanks for checking in xrange!!!!! Woot! > > > > test_compiler and test_transformer are waiting for someone to clean up > > the compiler package (I forget what it doesn't support, perhapes only > > nonlocal needs to be added.) > > It's definitely lagging on set comprehensions as well. I'm also pretty > sure those two tests broke before nonlocal was added, as they were > already broken when I started helping Georg in looking at the setcomp > updates. I just fixed the doctest failures; but for the compiler package I need help. Would you have the time? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 7 20:12:40 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 11:12:40 -0700 Subject: [Python-3000] PEP 3129: Class Decorators In-Reply-To: <43aa6ff70705071008q6a33e00eq7e5073dba5fa07e@mail.gmail.com> References: <43aa6ff70705071008q6a33e00eq7e5073dba5fa07e@mail.gmail.com> Message-ID: On 5/7/07, Collin Winter wrote: > Can I go ahead and mark PEP 3129 as "accepted"? Almost. I'm ok with it, but I think that to follow the procedure you ought to post the full text at least once on python-3000, so you can add the date to the "Post-History" header. In the mean time, I think it would be fine to start on the implementation! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From daniel at stutzbachenterprises.com Mon May 7 20:13:48 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 7 May 2007 13:13:48 -0500 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: <463AB2E1.2030408@canterbury.ac.nz> <463BDC9B.2030500@canterbury.ac.nz> Message-ID: On 5/7/07, Guido van Rossum wrote: > And what do you return when it doesn't support the container protocol? Assign the iterator object with the remaining items to d. > Think about the use cases. It seems that *your* use case is some kind > of (car, cdr) splitting known from Lisp and from functional languages > (Haskell is built out of this idiom it seems from the examples). But > in Python, if you want to loop over one of those things, you ought to > use a for-loop; and if you really want a car/cdr split, explicitly > using the syntax you show above (x[0], x[1:]) is fine. The use came I'm thinking of is this: A container type or an iterable where the first few entries contain one type of information, and the rest of the entries are something that will either be discard or run through for-loop. I encounter this frequently when reading text files where the first few lines are some kind of header with a known format and the rest of the file is data. > The important use case in Python for the proposed semantics is when > you have a variable-length record, the first few items of which are > interesting, and the rest of which is less so, but not unimportant. > (If you wanted to throw the rest away, you'd just write a, b, c = > x[:3] instead of a, b, c, *d = x.) That doesn't work if x is an iterable that doesn't support getslice (such as a file object). > It is much more convenient for this > use case if the type of d is fixed by the operation, so you can count > on its behavior. > There's a bug in the design of filter() in Python 2 (which will be > fixed in 3.0 by turning it into an iterator BTW): if the input is a > tuple, the output is a tuple too, but if the input is a list *or > anything else*, the output is a list. That's a totally insane > signature, since it means that you can't count on the result being a > list, *nor* on it being a tuple -- if you need it to be one or the > other, you have to convert it to one, which is a waste of time and > space. Please let's not repeat this design bug. I agree that's broken, because it carves out a weird exception for tuples. I disagree that it's analogous because I'm not suggesting carving out an exception. I'm suggesting, that: - lists return lists - tuples return tuples - XYZ containers return XYZ containers - non-container iterables return iterators. It's a consistent rule, albeit a different consistent rule than always returning the same type. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From tomerfiliba at gmail.com Mon May 7 20:28:27 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 7 May 2007 20:28:27 +0200 Subject: [Python-3000] the future of the GIL In-Reply-To: References: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> Message-ID: <1d85506f0705071128geac9062lef8309915f0ab7db@mail.gmail.com> On 5/7/07, Guido van Rossum wrote: > It's been settled by default -- nobody submitted a PEP to kill the GIL > in time for the April 30 deadline, and I won't accept one now. oh, i didn't mean to submit a PEP about that -- i don't have the time or the brainpower to do that. i was just wondering if there were any plans to do that in py3k, or if that's at all desired. but as you said, it's been settled by default. -tomer From collinw at gmail.com Mon May 7 20:29:25 2007 From: collinw at gmail.com (Collin Winter) Date: Mon, 7 May 2007 11:29:25 -0700 Subject: [Python-3000] PEP 3129: Class Decorators Message-ID: <43aa6ff70705071129r662d0627ma8882a2a8ded3b5d@mail.gmail.com> Guido pointed out that this PEP hadn't been sent to the list yet. Abstract ======== This PEP proposes class decorators, an extension to the function and method decorators introduced in PEP 318. Rationale ========= When function decorators were originally debated for inclusion in Python 2.4, class decorators were seen as obscure and unnecessary [#obscure]_ thanks to metaclasses. After several years' experience with the Python 2.4.x series of releases and an increasing familiarity with function decorators and their uses, the BDFL and the community re-evaluated class decorators and recommended their inclusion in Python 3.0 [#approval]_. The motivating use-case was to make certain constructs more easily expressed and less reliant on implementation details of the CPython interpreter. While it is possible to express class decorator-like functionality using metaclasses, the results are generally unpleasant and the implementation highly fragile [#motivation]_. In addition, metaclasses are inherited, whereas class decorators are not, making metaclasses unsuitable for some, single class-specific uses of class decorators. The fact that large-scale Python projects like Zope were going through these wild contortions to achieve something like class decorators won over the BDFL. Semantics ========= The semantics and design goals of class decorators are the same as for function decorators ([#semantics]_, [#goals]_); the only difference is that you're decorating a class instead of a function. The following two snippets are semantically identical: :: class A: pass A = foo(bar(A)) @foo @bar class A: pass For a detailed examination of decorators, please refer to PEP 318. Implementation ============== Adapating Python's grammar to support class decorators requires modifying two rules and adding a new rule :: funcdef: [decorators] 'def' NAME parameters ['->' test] ':' suite compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef need to be changed to :: decorated: decorators (classdef | funcdef) funcdef: 'def' NAME parameters ['->' test] ':' suite compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated Adding ``decorated`` is necessary to avoid an ambiguity in the grammar. The Python AST and bytecode must be modified accordingly. A reference implementation [#implementation]_ has been provided by Jack Diederich. References ========== .. [#obscure] http://www.python.org/dev/peps/pep-0318/#motivation .. [#approval] http://mail.python.org/pipermail/python-dev/2006-March/062942.html .. [#motivation] http://mail.python.org/pipermail/python-dev/2006-March/062888.html .. [#semantics] http://www.python.org/dev/peps/pep-0318/#current-syntax .. [#goals] http://www.python.org/dev/peps/pep-0318/#design-goals .. [#implementation] http://python.org/sf/1671208 From guido at python.org Mon May 7 21:14:30 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 12:14:30 -0700 Subject: [Python-3000] new io (pep 3116) In-Reply-To: <1d85506f0705070408h6b21540ej54888e97ad6854dc@mail.gmail.com> References: <1d85506f0705070408h6b21540ej54888e97ad6854dc@mail.gmail.com> Message-ID: On 5/7/07, tomer filiba wrote: > my original idea about the new i/o foundation was more elaborate > than the pep, but i have to admit the pep is more feasible and > compact. some comments though: > > writeline > ----------------------------- > TextIOBase should grow a writeline() method, to be symmetrical > with readline(). the reason is simple -- the newline char is > configurable in the constructor, so it's not necessarily "\n". > so instead of adding the configurable newline char manually, > the user should call writeline() which would append the > appropriate newline automatically. That's not symmetric. readline() returns a string that includes a trailing \n even if the actual file contained \r or \r\n. write() already is supposed to translate \n anywhere (not just at the end of the line) into the specified or platform-default (os.sep) separator. A method writeline() that *appended* a separator would be totally new to the I/O library. Even writelines() doesn't do that. > sockets > ----------------------------- > iirc, SocketIO is a layer that wraps an underlying socket object. > that's a good distinction -- to separate the underlying socket from > the RawIO interface -- but don't forget socket objects, > by themselves, need a cleanup too. But that's out of the scope of the PEP. The main change I intend to make is to return bytes instead of strings. > for instance, there's no point in UDP sockets having listen(), or send() > or getpeername() -- with UDP you only ever use sendto and recvfrom. > on the other hand, TCP sockets make no use of sendto(). and even with > TCP sockets, listeners never use send() or recv(), while connected > sockets never use listen() or connect(). > > moreover, the current socket interface simply mimics the BSD > interface. setsockopt, getsockopt, et al, are very unpythonic by nature -- > the ought to be exposed as properties or methods of the socket. > all in all, the current socket model is very low level with no high > level design. That's all out of scope for the PEP. Also I happen to think that there's nothing particularly wrong with sockets -- they generally get wrapped in higher layers like httplib. > some time ago i was working on a sketch for a new socket module > (called sock2) which had a clear distinction between connected sockets, > listener sockets and datagram sockets. each protocol was implemented > as a subclass of one of these base classes, and exposed only the > relevant methods. socket options were added as properties and > methods, and a new DNS module was added for dns-related queries. > > you can see it here -- http://sebulba.wikispaces.com/project+sock2 > i know it's late already, but i can write a PEP over the weekend, > or if someone else wants to carry on with the idea, that's fine > with me. Sorry, too late. We're putting serious pressue already on authors who posted draft PEPs before the deadline but haven't submitted their text to Subversion yet. At this point we have a definite list of PEPs that were either checked in or promised on time for the deadline. New proposals will have to wait until after 3.0a1 is released (hopefully end of June). Also note that the whole stdlib reorg is planned to happen after that release. > non-blocking IO > ----------------------------- > the pep says "In order to put an object in object in non-blocking > mode, the user must extract the fileno and do it by hand." > but i think it would only lead to trouble. as the entire IO library > is being rethought from the grounds up, non-blocking IO > should be taken into account. Why? Non-blocking I/O makes most of the proposed API useless. Non-blocking I/O is highly specialized and hard to code against. I'm all for a standard non-blocking I/O library but this one isn't it. > non-blocking IO depends greatly on the platform -- and this is > exactly why a cross-platform language should standardized that > as part of the new IO layer. saying "let's keep it for later" would only > require more work at some later stage. Actually there are only two things platform-specific: how to turn it on (or off) and how to tell the difference between "this operation would block" and "there was an error". > it's true that SyncIO and AsyncIO don't mingle well with the same > interfaces. that's why i think they should be two distinct classes. > the class hierarchy should be something like: > > class RawIO: > def fileno() > def close() > > class SyncIO(RawIO): > def read(count) > def write(data) > > class AsyncIO(RawIO): > def read(count, timeout) > def write(data, timeout) > def bgread(count, callback) > def bgwrite(data, callback) > > or something similar. there's no point to add both sync and async > operations to the RawIO level -- it just won't work together. > we need to keep the two distinct. I'd rather cut out all support for async I/O from this library and leave it for someone else to invent. I don't need it. People who use async I/O on sockets to implement e.g. fast web servers are unlikely to use io.py; they have their own API on top of raw sockets + select or poll. > buffering should only support SyncIO -- i also don't see much point > in having buffered async IO. it's mostly used for sockets and devices, > which are most likely to work with binary data structures rather than > text, and if you *require* non-blocking mode, buffering will only > get in your way. > > if you really want a buffered AsyncIO stream, you could write a > compatibility layer that makes the underlying AsyncIO object > appear synchronous. I agree with cutting async I/O from the buffered API, *except* for specifying that when the equivalent of EWOULDBLOCK happens at the lower level the buffering layer should notr retry but raise an error. I think it's okay if the raw layer has minimal support for async I/O. > records > ----------------------------- > another addition to the PEP that seems useful to me would be a > RecordIOBase/Wrapper. records are fixed-length binary data > structures, defined as format strings of the struct-module. > > class RecordIOWrapper: > def __init__(self, buffer, format) > def read(self) -> tuple of fields > def write(self, *fields) The struct module has the means to build that out of lower-level reads and writes already. If you think a library module to support this is needed, write one and make it available as a third party module and see how many customers you get. Personally I haven't had the need for files containing of fixed-length records of the same type since the mid '80s. > another cool feature i can think of is "multiplexing", or working > with the same underlying stream in different ways by having multiple > wrappers over it. That's why we make the underlying 'raw' object available as an attribute. So you can experiment with this. > for example, to implement a type-length-value stream, which is very > common in communication protocols, one could do something like > > class MultiplexedIO: > def __init__(self, *streams): > self.streams = itertools.cycle(streams) > def read(self, *args): > """read from the next stream each time it's called""" > return self.streams.next().read(*args) > > sock = BufferedRW(SocketIO(...)) > tlrec = Record(sock, "!BL") > tlv = MultiplexedIO(tvrec, sock) > > type, length = tlv.read() > value = tlv.read(length) > > you can also build higher-level state machines with that -- for instance, > if the type was "int", the next call to read() would decode the value as > an integer, and so on. you could write parsers right on top of the IO > layer. > > just an idea. i'm not sure if that's proper design or just a silly idea, > but we'll leave that to the programmer. I don't think the new I/O library is the place to put in a bunch of new, essentially untried ideas. Instead, we should aim for a flexible implementation of APIs that we know work and are needed. I think the current stack is pretty flexible in that it supports streams and random access, unidirectional and bidirectional, raw and buffered, bytes and text. Applications can do a lot with those. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 7 21:16:27 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 12:16:27 -0700 Subject: [Python-3000] PEP 3129: Class Decorators In-Reply-To: <43aa6ff70705071129r662d0627ma8882a2a8ded3b5d@mail.gmail.com> References: <43aa6ff70705071129r662d0627ma8882a2a8ded3b5d@mail.gmail.com> Message-ID: On 5/7/07, Collin Winter wrote: [...] > This PEP proposes class decorators, an extension to the function > and method decorators introduced in PEP 318. [...] > The semantics and design goals of class decorators are the same as > for function decorators ([#semantics]_, [#goals]_); the only > difference is that you're decorating a class instead of a function. > The following two snippets are semantically identical: :: > > class A: > pass > A = foo(bar(A)) > > > @foo > @bar > class A: > pass I'm +1 on this PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 7 21:24:28 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 12:24:28 -0700 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: <463AB2E1.2030408@canterbury.ac.nz> <463BDC9B.2030500@canterbury.ac.nz> Message-ID: On 5/7/07, Daniel Stutzbach wrote: > On 5/7/07, Guido van Rossum wrote: > > And what do you return when it doesn't support the container protocol? > > Assign the iterator object with the remaining items to d. > > > Think about the use cases. It seems that *your* use case is some kind > > of (car, cdr) splitting known from Lisp and from functional languages > > (Haskell is built out of this idiom it seems from the examples). But > > in Python, if you want to loop over one of those things, you ought to > > use a for-loop; and if you really want a car/cdr split, explicitly > > using the syntax you show above (x[0], x[1:]) is fine. > > The use came I'm thinking of is this: > > A container type or an iterable where the first few entries contain > one type of information, and the rest of the entries are something > that will either be discard or run through for-loop. > > I encounter this frequently when reading text files where the first > few lines are some kind of header with a known format and the rest of > the file is data. This sounds like a parsing problem. IMO it's better to treat it as such. > > The important use case in Python for the proposed semantics is when > > you have a variable-length record, the first few items of which are > > interesting, and the rest of which is less so, but not unimportant. > > > (If you wanted to throw the rest away, you'd just write a, b, c = > > x[:3] instead of a, b, c, *d = x.) > > That doesn't work if x is an iterable that doesn't support getslice > (such as a file object). > > > It is much more convenient for this > > use case if the type of d is fixed by the operation, so you can count > > on its behavior. > > > There's a bug in the design of filter() in Python 2 (which will be > > fixed in 3.0 by turning it into an iterator BTW): if the input is a > > tuple, the output is a tuple too, but if the input is a list *or > > anything else*, the output is a list. That's a totally insane > > signature, since it means that you can't count on the result being a > > list, *nor* on it being a tuple -- if you need it to be one or the > > other, you have to convert it to one, which is a waste of time and > > space. Please let's not repeat this design bug. > > I agree that's broken, because it carves out a weird exception for > tuples. I disagree that it's analogous because I'm not suggesting > carving out an exception. > > I'm suggesting, that: > > - lists return lists > - tuples return tuples > - XYZ containers return XYZ containers > - non-container iterables return iterators. > > It's a consistent rule, albeit a different consistent rule than always > returning the same type. But I expect less useful. It won't support "a, *b, c = " either. From an implementation POV, if you have an unknown object on the RHS, you have to try slicing it before you try iterating over it; this may cause problems e.g. if the object happens to be a defaultdict -- since x[3:] is implemented as x[slice(None, 3, None)], the defaultdict will give you its default value. I'd much rather define this in terms of iterating over the object until it is exhausted, which can be optimized for certain known types like lists and tuples. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tomerfiliba at gmail.com Mon May 7 22:12:05 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 7 May 2007 22:12:05 +0200 Subject: [Python-3000] new io (pep 3116) In-Reply-To: References: <1d85506f0705070408h6b21540ej54888e97ad6854dc@mail.gmail.com> Message-ID: <1d85506f0705071312w43c1f59cwe4f84239a4a69a6a@mail.gmail.com> On 5/7/07, Guido van Rossum wrote: > That's not symmetric. readline() returns a string that includes a > trailing \n even if the actual file contained \r or \r\n. write() > already is supposed to translate \n anywhere (not just at the end of > the line) into the specified or platform-default (os.sep) separator. well, if write() is meant to do that anyway, writeline() is not required. > > moreover, the current socket interface simply mimics the BSD > > interface. setsockopt, getsockopt, et al, are very unpythonic by nature -- > > the ought to be exposed as properties or methods of the socket. > > all in all, the current socket model is very low level with no high > > level design. > > That's all out of scope for the PEP. Also I happen to think that > there's nothing particularly wrong with sockets -- they generally get > wrapped in higher layers like httplib. first, when i first brought this up a year ago, you were in favor http://mail.python.org/pipermail/python-3000/2006-April/001497.html still, there's much code that handles sockets. the client side is mostly standard: connect, do something, quit. but on the server side it's another story, and i have had many long battles with man pages to make sockets behave as expected. not all protocols are based on http, after all, and the fellas that write modules like httplib have a lot of black magic to do. a better design of the socket module could help a lot, as well as making small, repeated tasks easier/more logical. compare import socket s = socket.socket() s.connect(("foobar", 1234)) to from socket import TcpStream s = TcpStream("foobar", 1234) > > you can see it here -- http://sebulba.wikispaces.com/project+sock2 > > i know it's late already, but i can write a PEP over the weekend, > > or if someone else wants to carry on with the idea, that's fine > > with me. > > Sorry, too late. We're putting serious pressue already on authors who > posted draft PEPs before the deadline but haven't submitted their text > to Subversion yet. At this point we have a definite list of PEPs that > were either checked in or promised on time for the deadline. New > proposals will have to wait until after 3.0a1 is released (hopefully > end of June). Also note that the whole stdlib reorg is planned to > happen after that release. well, my code is pure python, and can just replace the existing socket.py module. the _socket module remains in tact. it can surely wait for the stdlib reorg though, there's no need to rush into it now. i'll submit the PEP in the near future. > > non-blocking IO depends greatly on the platform -- and this is > > exactly why a cross-platform language should standardized that > > as part of the new IO layer. saying "let's keep it for later" would only > > require more work at some later stage. > > Actually there are only two things platform-specific: how to turn it > on (or off) and how to tell the difference between "this operation > would block" and "there was an error". well, the way i see it, that's exactly why this calls for standardization. > The struct module has the means to build that out of lower-level reads > and writes already. If you think a library module to support this is > needed, write one and make it available as a third party module and > see how many customers you get. Personally I haven't had the need for > files containing of fixed-length records of the same type since the > mid '80s. not all of us are that lucky :) there's still lots of protocols ugly protocols and file formats for us programmers to handle. and TLV structures happen to the be one of the pretty ones. > I don't think the new I/O library is the place to put in a bunch of > new, essentially untried ideas. Instead, we should aim for a flexible > implementation of APIs that we know work and are needed. I think the > current stack is pretty flexible in that it supports streams and > random access, unidirectional and bidirectional, raw and buffered, > bytes and text. Applications can do a lot with those. yeah, it was more like a wild idea really. it should be placed in a different module. -tomer From martin at v.loewis.de Tue May 8 00:02:58 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 08 May 2007 00:02:58 +0200 Subject: [Python-3000] PEP 3112 In-Reply-To: References: <463D8800.1010906@v.loewis.de> Message-ID: <463FA212.9070609@v.loewis.de> >> 1. in Grammar changes: Each shortstringchar or longstringchar must >> be a character whose Unicode ordinal value is between 1 and >> 127 inclusive. > > Sounds like a good fix to me; I agree that bytes literals, like > Unicode literals, should not vary depending on the source encoding. In > step 2, can't you use "ascii" as the encoding? Sure. Technically, ASCII might include \0 (depending on definition), but that is ruled out as a character in Python source code, anyway. So: "must be an ASCII character" is just as clear, and much shorter. I guess Jason associated "ASCII character" with "single byte", so it can't be simultaneously both ASCII and Unicode, hence he chose the more elaborate wording. Of course, if one views ASCII as a character set (rather than an coded character set), a Unicode character may or may not simultaneously be an ASCII character. Regards, Martin From greg.ewing at canterbury.ac.nz Tue May 8 00:44:15 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 08 May 2007 10:44:15 +1200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: <463AB2E1.2030408@canterbury.ac.nz> <463BDC9B.2030500@canterbury.ac.nz> Message-ID: <463FABBF.2080808@canterbury.ac.nz> Daniel Stutzbach wrote: > The use came I'm thinking of is this: > > A container type or an iterable where the first few entries contain > one type of information, and the rest of the entries are something > that will either be discard or run through for-loop. If you know you're dealing with an iterator x, then after a, b, c, *d = x, d would simply be x, so all you really need is a function to get the first n items from x. > I'm suggesting, that: > > - lists return lists > - tuples return tuples > - XYZ containers return XYZ containers > - non-container iterables return iterators. How do you propose to distinguish between the last two cases? Attempting to slice it and catching an exception is not acceptable, IMO, as it can too easily mask bugs. -- Greg From ark at acm.org Tue May 8 01:48:27 2007 From: ark at acm.org (Andrew Koenig) Date: Mon, 7 May 2007 19:48:27 -0400 Subject: [Python-3000] PEP 3125 -- a modest proposal Message-ID: <000101c79102$385e1340$a91a39c0$@org> Yes, I have read Swift :-) And in that spirit, I don't know whether to take this proposal seriously because it's kind of radical. Nevertheless, here goes... It has occurred to me that as Python stands today, an indent always begins with a colon. So in principle, we could define anything that looks like an indent but doesn't begin with a colon as a continuation. So the idea would be that you can continue a statement onto as many lines as you wish, provided that Each line after the first is indented strictly more than the first line (but not necessarily more than the remaining lines in the statement), and If there is a colon that will precede an indent, it is the last token of the last line, in which case the line after the colon must be indented strictly more than the first line (but not necessarily more than the remaining lines in the statement). For example: "abc" + "def" # second line with more whitespace than the first -- continuation "abc" + "def" # attempt to apply unary + to string literal "abc" + "def" + "ghi" # OK -- this line is indented more than "abc" This proposal has the advantage of being entirely lexical -- it doesn't even rely on counting parentheses or brackets, so unlike the current Python rule, it can be implemented entirely as a regular expression. It has the disadvantage of being a change, and may have its own pitfalls: if foo # Oops, I forgot the colon + bar # which makes this line a continuation Of course, when "if" isn't followed eventually by a colon, the code won't compile. However... x = 3, 4 # x = (3, 4) x = 3, # x = (3,) 4 # evaluate 4 and throw it away So it may be that this proposed rule is too tricky to use. However, it does have the merit of being even simpler than the current rule. Just a thought... From ncoghlan at gmail.com Tue May 8 03:36:46 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 08 May 2007 11:36:46 +1000 Subject: [Python-3000] failing tests In-Reply-To: References: <463F5135.2090007@gmail.com> Message-ID: <463FD42E.8060408@gmail.com> Guido van Rossum wrote: > On 5/7/07, Nick Coghlan wrote: >> Guido van Rossum wrote: >> > Thanks for checking in xrange!!!!! Woot! >> > >> > test_compiler and test_transformer are waiting for someone to clean up >> > the compiler package (I forget what it doesn't support, perhapes only >> > nonlocal needs to be added.) >> >> It's definitely lagging on set comprehensions as well. I'm also pretty >> sure those two tests broke before nonlocal was added, as they were >> already broken when I started helping Georg in looking at the setcomp >> updates. > > I just fixed the doctest failures; but for the compiler package I > need help. Would you have the time? I don't really know the compiler package at all. I'll have a look, but it's going to take me a while to even figure out where the fixes need to go, let alone what they will actually look like. So if someone more familiar with the package beats me to fixing it, I won't be the least bit upset ;) Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Tue May 8 04:06:49 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 7 May 2007 19:06:49 -0700 Subject: [Python-3000] failing tests In-Reply-To: <463FD42E.8060408@gmail.com> References: <463F5135.2090007@gmail.com> <463FD42E.8060408@gmail.com> Message-ID: On 5/7/07, Nick Coghlan wrote: > Guido van Rossum wrote: > > I just fixed the doctest failures; but for the compiler package I > > need help. Would you have the time? > > I don't really know the compiler package at all. I'll have a look, but > it's going to take me a while to even figure out where the fixes need to > go, let alone what they will actually look like. > > So if someone more familiar with the package beats me to fixing it, I > won't be the least bit upset ;) Same boat I'm in. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Tue May 8 07:36:36 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Mon, 7 May 2007 22:36:36 -0700 Subject: [Python-3000] ref counts Message-ID: I'm starting up a continuous build of sorts on the PSF machine for the 3k branch. Right now the failures will only go to me. I've excluded the two tests that are known to currently fail. This will help us find new failures (including ref leaks). Probably in a week or so I'll send the results to python-3000-checkins. Since it's just running on a single machine (every 12 hours), this should be pretty stable. It has been for the trunk and 2.5 branch. I just wanted to point out some data points wrt ref counts. At the end of a test run on trunk with 298 tests the total ref count is: [482838 refs] When starting a new process (like during the subprocess tests): [7323 refs] With 3k and 302 tests: [615279 refs] and: [10457 refs] I don't think these are problematic. I expect that these ~30% increases in total ref counts are primarily the result of new-style classes. n From python at rcn.com Tue May 8 07:40:21 2007 From: python at rcn.com (Raymond Hettinger) Date: Mon, 7 May 2007 22:40:21 -0700 Subject: [Python-3000] PEP 3125 -- a modest proposal References: <000101c79102$385e1340$a91a39c0$@org> Message-ID: <009701c79142$6424a260$f001a8c0@RaymondLaptop1> [Andrew Koenig] > It has occurred to me that as Python stands today, an indent always begins > with a colon. So in principle, we could define anything that looks like an > indent but doesn't begin with a colon as a continuation. So the idea would > be that you can continue a statement onto as many lines as you wish, Too dangerous. The most common Python syntax error (by far, even for experienced users) is omission of a colon. If the missing colon starts to have its own special meaning, that would not be a good thing. If you're in the mood to propose something radical, how about dropping the colon altogether, leaving indention as the sure reliable cue and cleaning-up the appearance of code in a new world where colons are also being used for annotation as well as slicing: def f(x: xtype, y: type) result = [] for i, elem in enumerate(x) if elem < 0 result.append(y[:i]) else result.append(y[i:]) return result It looks very clean to my eyes. Raymond From rrr at ronadam.com Tue May 8 10:28:56 2007 From: rrr at ronadam.com (Ron Adam) Date: Tue, 08 May 2007 03:28:56 -0500 Subject: [Python-3000] PEP 3125 -- a modest proposal In-Reply-To: <009701c79142$6424a260$f001a8c0@RaymondLaptop1> References: <000101c79102$385e1340$a91a39c0$@org> <009701c79142$6424a260$f001a8c0@RaymondLaptop1> Message-ID: <464034C8.6030007@ronadam.com> Raymond Hettinger wrote: > [Andrew Koenig] >> It has occurred to me that as Python stands today, an indent always begins >> with a colon. So in principle, we could define anything that looks like an >> indent but doesn't begin with a colon as a continuation. So the idea would >> be that you can continue a statement onto as many lines as you wish, > > Too dangerous. The most common Python syntax error (by far, even for > experienced users) is omission of a colon. If the missing colon starts > to have its own special meaning, that would not be a good thing. > > If you're in the mood to propose something radical, how about dropping > the colon altogether, leaving indention as the sure reliable cue and > cleaning-up the appearance of code in a new world where colons > are also being used for annotation as well as slicing: > > def f(x: xtype, y: type) > result = [] > for i, elem in enumerate(x) > if elem < 0 > result.append(y[:i]) > else > result.append(y[i:]) > return result > > It looks very clean to my eyes. So no more single line definitions? If you think of the colon as meaning, 'associated to', it's use is both clear and consistent in all cases except when used in slicing. I also think it helps the code be more readable because when its used in combination with indenting because it looks more like a common outline definition that even non-programmers are familiar with. So it may have value in this regard because it makes the intent of the code clearer to new users. Removing it may blur the difference of block headers and block bodies in the mind. The computer may not need it, but I expect it helps us humans keep things straight in our heads. So maybe a more modest proposal is to change the colon in slicing to a semi-colon where it can have it's own meaning. def f(x: xtype, y: type): result = [] for i, elem in enumerate(x): if elem < 0: result.append(y[;i]) else: result.append(y[i;]) return result Cheers, Ron From ncoghlan at gmail.com Tue May 8 14:45:05 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 08 May 2007 22:45:05 +1000 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <02b401c78d15$f04b6110$090a0a0a@enfoldsystems.local> References: <02b401c78d15$f04b6110$090a0a0a@enfoldsystems.local> Message-ID: <464070D1.6040701@gmail.com> Mark Hammond wrote: > Please add my -1 to the chorus here, for the same reasons already expressed. Another -1 here - while I agree there are benefits to removing backslash continuations and string literal concatenation, I don't think they're significant enough to justify the hassle of making it happen. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ark at acm.org Tue May 8 15:08:51 2007 From: ark at acm.org (Andrew Koenig) Date: Tue, 8 May 2007 09:08:51 -0400 Subject: [Python-3000] PEP 3125 -- a modest proposal In-Reply-To: <009701c79142$6424a260$f001a8c0@RaymondLaptop1> References: <000101c79102$385e1340$a91a39c0$@org> <009701c79142$6424a260$f001a8c0@RaymondLaptop1> Message-ID: <007301c79172$065bfea0$1313fbe0$@org> > Too dangerous. The most common Python syntax error (by far, even for > experienced users) is omission of a colon. If the missing colon starts > to have its own special meaning, that would not be a good thing. It's not special -- omitting it would have exactly the same effect as omitting a colon does today in a single-line statement. That is, today you can write if x < y: x = y or you can forget the colon and write if x < y x = y and (usually) be diagnosed by the compiler. My proposal would make if x < y: x = y and if x < y x = y have the same meanings as (respectively) the first two examples above, so the fourth example would still be diagnosed as an error for the same reason. From jason.orendorff at gmail.com Tue May 8 15:16:32 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Tue, 8 May 2007 08:16:32 -0500 Subject: [Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py )) In-Reply-To: References: <20070505150035.GA16303@panix.com> <200705051334.45120.fdrake@acm.org> <20070505124008.648D.JCARLSON@uci.edu> Message-ID: On 5/7/07, Guido van Rossum wrote: > I don't know how this will work out yet. I'm not convinced that having > both mutable and immutable bytes is the right thing to do; but I'm > also not convinced of the opposite. I am slowly working on the > string/unicode unification, and so far, unfortunately, it is quite > daunting to get rid of 8-bit strings even at the Python level let > alone at the C level. Guido, if 3.x had an immutable bytes type, could 2to3 provide a better guarantee? Namely, "Set your default encoding to None in your 2.x code today, and 2to3 will not introduce bugs around str/unicode." 2to3 could produce 3.x code that preserves the 2.x meaning by using 2.x-ish types, including immutable byte strings. Without this, my understanding is that 2to3 will introduce bugs. Am I wrong? This might be worth doing even if you decide an immutable 8-bit type is wrong for the core language. The type could be hidden away in an "upgradelib" module somewhere. Surely people will prefer correctness over "producing nice, idiomatic 3.x code" in the 2to3 tool. -j From guido at python.org Tue May 8 15:48:57 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 8 May 2007 06:48:57 -0700 Subject: [Python-3000] [Python-Dev] PEP 30XZ: Simplified Parsing In-Reply-To: <464070D1.6040701@gmail.com> References: <02b401c78d15$f04b6110$090a0a0a@enfoldsystems.local> <464070D1.6040701@gmail.com> Message-ID: On 5/8/07, Nick Coghlan wrote: > Mark Hammond wrote: > > Please add my -1 to the chorus here, for the same reasons already expressed. > > Another -1 here - while I agree there are benefits to removing backslash > continuations and string literal concatenation, I don't think they're > significant enough to justify the hassle of making it happen. OK. I'm just about ready to reject both PEP 3125 and PEP 3126 on the grounds of lack of popular support and insufficient benefits. If anyone is truly upset about this, let them speak up now, or be forever silent. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 8 16:02:01 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 8 May 2007 07:02:01 -0700 Subject: [Python-3000] Deadline for checking PEPs into subversion Message-ID: In fairness to would-be new PEP proposals for Python 3000, I am asking everyone who still has a draft PEP that's not checked in to subversion to please check in *a* version of it as soon as possible. This version doesn't have to be final; expect debate which may require a rewrite all or part of your PEP. But I want every proposal that's on the table in subversion so we can make up a definitive list of proposals being considered for Python 3.0a1 (to be released by the end of June). If you don't have checkin permissions, send your PEP to peps at python.org. Please do follow the PEP guidelines in PEP 1 and use either PEP 9 or PEP 12 as a template. Any PEP non checked into subversion (or at least received by peps at python.org) by the end of Sunday, May 13, is out of consideration. Note: standard library reorganization PEPs don't fall under this deadline; the stdlib reorg will begin after the release of 3.0a1. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From avassalotti at acm.org Tue May 8 16:03:42 2007 From: avassalotti at acm.org (Alexandre Vassalotti) Date: Tue, 8 May 2007 10:03:42 -0400 Subject: [Python-3000] PEP 3125 -- a modest proposal In-Reply-To: <009701c79142$6424a260$f001a8c0@RaymondLaptop1> References: <000101c79102$385e1340$a91a39c0$@org> <009701c79142$6424a260$f001a8c0@RaymondLaptop1> Message-ID: On 5/8/07, Raymond Hettinger wrote: > If you're in the mood to propose something radical, how about dropping > the colon altogether, leaving indention as the sure reliable cue and > cleaning-up the appearance of code in a new world where colons > are also being used for annotation as well as slicing: > > def f(x: xtype, y: type) > result = [] > for i, elem in enumerate(x) > if elem < 0 > result.append(y[:i]) > else > result.append(y[i:]) > return result > > It looks very clean to my eyes. > This proposal is surely doomed is advance. If I remember well the trailing colon comes from Python's precursor, ABC. They realized it was not necessary for the parser but it did make the programs more readable for humans. Would it be a good idea, to continue this thread on Python-ideas? I doubt such changes will be accepted, since we are now past the PEP deadline for changes to the core language. -- Alexandre From guido at python.org Tue May 8 16:10:51 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 8 May 2007 07:10:51 -0700 Subject: [Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py )) In-Reply-To: References: <20070505150035.GA16303@panix.com> <200705051334.45120.fdrake@acm.org> <20070505124008.648D.JCARLSON@uci.edu> Message-ID: On 5/8/07, Jason Orendorff wrote: > On 5/7/07, Guido van Rossum wrote: > > I don't know how this will work out yet. I'm not convinced that having > > both mutable and immutable bytes is the right thing to do; but I'm > > also not convinced of the opposite. I am slowly working on the > > string/unicode unification, and so far, unfortunately, it is quite > > daunting to get rid of 8-bit strings even at the Python level let > > alone at the C level. > > Guido, if 3.x had an immutable bytes type, could 2to3 provide a > better guarantee? Namely, "Set your default encoding to None > in your 2.x code today, and 2to3 will not introduce bugs around > str/unicode." I don't know. I may be able to tell you when I'm further into the process of unifying str and unicode. > 2to3 could produce 3.x code that preserves the 2.x meaning by > using 2.x-ish types, including immutable byte strings. This sounds dangerously close to crippling 3.0 with backwards compatibility. I want to reserve this option as a last resort. > Without this, my understanding is that 2to3 will introduce bugs. > Am I wrong? No -- 2to3 cannot guarantee that your code will work correctly, because it doesn't do any data flow analysis or type inferencing. This is not limited to strings. > This might be worth doing even if you decide an immutable 8-bit > type is wrong for the core language. The type could be hidden > away in an "upgradelib" module somewhere. Surely people will > prefer correctness over "producing nice, idiomatic 3.x code" > in the 2to3 tool. With that I agree, at least in general (e.g. d.keys() gets translated to list(d.keys()) and d.iterkeys(0 to iter(d.keys())). In the current py3k-struni branch I have temporarily kept the 8-bit string type around, renamed to str8. I am hoping I will be able to get rid of it eventually but I may not succeed and then we'll have it available as a backup. For anyone who wants to discuss this more -- please come and help out in the py3k-struni branch first. It is simply too soon to be able to make decisions based on the evidence available so far, and I won't be forced. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Tue May 8 16:34:26 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 8 May 2007 10:34:26 -0400 Subject: [Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py )) In-Reply-To: References: <20070505150035.GA16303@panix.com> <200705051334.45120.fdrake@acm.org> <20070505124008.648D.JCARLSON@uci.edu> Message-ID: On 5/8/07, Jason Orendorff wrote: > On 5/7/07, Guido van Rossum wrote: > > daunting to get rid of 8-bit strings even at the Python level let > > alone at the C level. > Guido, if 3.x had an immutable bytes type, could 2to3 provide a > better guarantee? Namely, "Set your default encoding to None > in your 2.x code today, and 2to3 will not introduce bugs around > str/unicode." Presumably b" " would be the immutable version. In some sense, this would mean that the string/unicode unification (assuming interning; so that I can use "is" for something stronger than __eq__) would boil down to: Py2.6 b"str" is "str" == u"str" Py3.X b"str" == "str" is u"str" with a few details like 2.5 didn't have the b"str" spelling, and 3.x might not support the u"str" spelling. > This might be worth doing even if you decide an immutable 8-bit > type is wrong for the core language. The type could be hidden > away in an "upgradelib" module somewhere. Surely people will > prefer correctness over "producing nice, idiomatic 3.x code" > in the 2to3 tool. I will be unhappy if 2to3 produces code that I can't run in (at least) 2.6, because then I would need to convert more than once. I would be unhappy if 2to3 produced code that I couldn't safely copy; that is too magical. I would be unhappy if 2to3 produced code that isn't a good example, unless it also had (at least an option, probably a default) to add comments suggesting a manual verification and what could *probably* be used instead. -jJ From lcaamano at gmail.com Tue May 8 16:35:46 2007 From: lcaamano at gmail.com (Luis P Caamano) Date: Tue, 8 May 2007 10:35:46 -0400 Subject: [Python-3000] the future of the GIL Message-ID: On 5/7/07, "Guido van Rossum" wrote: > > > Around '99 Greg Stein and Mark Hammond tried to get rid of the GIL. > They removed most of the global mutable data structures, added > explicit locks to the remaining ones and to individual mutable > objects, and actually got the whole thing working. Unfortunately even > on the system with the fastest locking primitives (Windows at the > time) they measured a 2x slow-down on a single CPU due to all the > extra locking operations going on. That just breaks my heart. You gotta finish that sentence, it was a slow down on single CPU with a speed increase with two or more CPUs, leveling out at 4 CPUs or so. This was the same situation on every major OS kernel, including AIX, HPUX, Linux, Tru64, etc., when they started supporting SMP machines, which is why all of them at some time sported two kernels, one for SMP machines with the spinlock code and one for single processor machines with the spinlock code #ifdef'ed out. For some, like IBM/AIX and HPUX, eventually and as expected, all their servers became MPs and then they stopped delivering the SP kernel. The same would've been true for the python interpreter, one for MP and one for SP, and eventually, even in the PC world, everything would be MP and the SP interpreter would disappear. People need to understand though that the GIL is not as bad as one would initially think as most C extensions release the GIL and run concurrently on multiple CPUs. It takes a bit of researching through old emails in the python list and a bit of time to really understand that. Nevertheless, when the itch is bad enough, it'll get scratched. -- Luis P Caamano Atlanta, GA USA From p.f.moore at gmail.com Tue May 8 16:57:41 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 8 May 2007 15:57:41 +0100 Subject: [Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py )) In-Reply-To: References: <20070505150035.GA16303@panix.com> <200705051334.45120.fdrake@acm.org> <20070505124008.648D.JCARLSON@uci.edu> Message-ID: <79990c6b0705080757y23742af4pd4d424ba77e4fe7@mail.gmail.com> On 08/05/07, Jim Jewett wrote: > I will be unhappy if 2to3 produces code that I can't run in (at least) > 2.6, because then I would need to convert more than once. IIUC, the idea is that you should be able to write valid Python 2.6 code which 2to3 can convert automatically. There is no intention that 2to3 should automatically handle arbitrary 2.x code (at least, not without the risk of bugs)., and certainly no intention that the *output* of 2to3 be runnable in 2.6 (in general). Yes, you convert more than once. Until you cut over, your 2.6 source is the master, and the output of 2to3 should be treated as generated code. > I would be unhappy if 2to3 produced code that I couldn't safely copy; > that is too magical. Not sure what that means. > I would be unhappy if 2to3 produced code that isn't a good example, > unless it also had (at least an option, probably a default) to add > comments suggesting a manual verification and what could *probably* be > used instead. I'd like 2to3 code to be at least maintainable. Surely it's too much to assume it's going to be a good example of idiomatic 3.x code, though? Paul. From jimjjewett at gmail.com Tue May 8 17:09:46 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 8 May 2007 11:09:46 -0400 Subject: [Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py )) In-Reply-To: <79990c6b0705080757y23742af4pd4d424ba77e4fe7@mail.gmail.com> References: <20070505150035.GA16303@panix.com> <200705051334.45120.fdrake@acm.org> <20070505124008.648D.JCARLSON@uci.edu> <79990c6b0705080757y23742af4pd4d424ba77e4fe7@mail.gmail.com> Message-ID: On 5/8/07, Paul Moore wrote: > On 08/05/07, Jim Jewett wrote: > > I will be unhappy if 2to3 produces code that I can't run in > > (at least) 2.6, because then I would need to convert more > > than once. > IIUC, the idea is that you should be able to write valid Python 2.6 > code which 2to3 can convert automatically. There is no intention > that 2to3 should automatically handle arbitrary 2.x code (at > least, not without the risk of bugs)., I thought that was indeed the goal. > and certainly no intention that the > *output* of 2to3 be runnable in 2.6 (in general). Agreed that it isn't, but I think it should be. > Yes, you convert more than once. Until you cut over, your > 2.6 source is the master, and the output of 2to3 should be > treated as generated code. And you can't cut over until you're ready to abandon 2.x. > > I would be unhappy if 2to3 produced code that I couldn't > > safely copy; that is too magical. > Not sure what that means. Many people learn by example. Many people don't even bother learning; they just cut and paste. If the only example Py3 code they see is ugly and bloated, that is the idiom they will internalize. > > I would be unhappy if 2to3 produced code that isn't a good > > example, unless it also had (at least an option, probably a > > default) to add comments suggesting a manual verification > > and what could *probably* be used instead. > > I'd like 2to3 code to be at least maintainable. Surely it's too > much to assume it's going to be a good example of idiomatic > 3.x code, though? Probably -- which is why it should at least be possible to focus your attention on the parts that need manual changes. (And, of course, the number of such places should be minimized, particularly if you can't run the result in 2.6, since it is effectively a fork.) -jJ From guido at python.org Tue May 8 17:25:27 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 8 May 2007 08:25:27 -0700 Subject: [Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py )) In-Reply-To: References: <20070505150035.GA16303@panix.com> <200705051334.45120.fdrake@acm.org> <20070505124008.648D.JCARLSON@uci.edu> Message-ID: On 5/8/07, Jim Jewett wrote: > I will be unhappy if 2to3 produces code that I can't run in (at least) > 2.6, because then I would need to convert more than once. This is the first time I hear of this requirement. It has not so far been a design goal for the conversions in 2to3. The workflow that I have in mind (and that others have agreed to be workable) is more like this: 1. develop working code under 2.6 2. make sure it is warning-free with the special -Wpy3k option 3. use 2to3 to convert it to 3.0 compatible syntax in a temporary directory 4. run your unit test suite with 3.0 5. for any defects you find, EDIT THE 2.6 SOURCE AND GO BACK TO STEP 2 -- --Guido van Rossum (home page: http://www.python.org/~guido/) From foom at fuhm.net Tue May 8 17:26:09 2007 From: foom at fuhm.net (James Y Knight) Date: Tue, 8 May 2007 11:26:09 -0400 Subject: [Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py )) In-Reply-To: References: <20070505150035.GA16303@panix.com> <200705051334.45120.fdrake@acm.org> <20070505124008.648D.JCARLSON@uci.edu> Message-ID: <1B70083D-1CC2-44EA-A8F4-404F1E493271@fuhm.net> On May 8, 2007, at 9:16 AM, Jason Orendorff wrote: > Guido, if 3.x had an immutable bytes type, could 2to3 provide a > better guarantee? Namely, "Set your default encoding to None > in your 2.x code today, and 2to3 will not introduce bugs around > str/unicode." You cannot set the default encoding to None (rather, "undefined") in 2.x, without making half the stdlib completely unusable. So that's not really much of an option. James From guido at python.org Tue May 8 17:37:44 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 8 May 2007 08:37:44 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: References: Message-ID: On 5/8/07, Luis P Caamano wrote: > On 5/7/07, "Guido van Rossum" wrote: > > Around '99 Greg Stein and Mark Hammond tried to get rid of the GIL. > > They removed most of the global mutable data structures, added > > explicit locks to the remaining ones and to individual mutable > > objects, and actually got the whole thing working. Unfortunately even > > on the system with the fastest locking primitives (Windows at the > > time) they measured a 2x slow-down on a single CPU due to all the > > extra locking operations going on. > > That just breaks my heart. > > You gotta finish that sentence, it was a slow down on single CPU with > a speed increase with two or more CPUs, leveling out at 4 CPUs or so. > > This was the same situation on every major OS kernel, including AIX, > HPUX, Linux, Tru64, etc., when they started supporting SMP machines, > which is why all of them at some time sported two kernels, one for SMP > machines with the spinlock code and one for single processor machines > with the spinlock code #ifdef'ed out. For some, like IBM/AIX and > HPUX, eventually and as expected, all their servers became MPs and > then they stopped delivering the SP kernel. > > The same would've been true for the python interpreter, one for MP and > one for SP, and eventually, even in the PC world, everything would be > MP and the SP interpreter would disappear. The difference is, for an OS kernel, there really isn't any other way to benefit from multiple CPUs. But for Python, there is -- run multiple processes instead of threads! > People need to understand though that the GIL is not as bad as one > would initially think as most C extensions release the GIL and run > concurrently on multiple CPUs. It takes a bit of researching through > old emails in the python list and a bit of time to really understand > that. Nevertheless, when the itch is bad enough, it'll get scratched. I think you're overestimating the sophistication of the average extension developer, and the hardware to which they have access. Nevertheless, you're right the GIL is not as bad as you would initially think: you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities. Just because Java was once aimed at a set-top box OS that didn't support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn't mean that multiple processes (with judicious use of IPC) aren't a much better approach to writing apps for multi-CPU boxes than threads. Just Say No to the combined evils of locking, deadlocks, lock granularity, livelocks, nondeterminism and race conditions. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From theller at ctypes.org Tue May 8 18:47:55 2007 From: theller at ctypes.org (Thomas Heller) Date: Tue, 08 May 2007 18:47:55 +0200 Subject: [Python-3000] the future of the GIL In-Reply-To: References: Message-ID: Guido van Rossum schrieb: > On 5/8/07, Luis P Caamano wrote: >> On 5/7/07, "Guido van Rossum" wrote: >> > Around '99 Greg Stein and Mark Hammond tried to get rid of the GIL. >> > They removed most of the global mutable data structures, added >> > explicit locks to the remaining ones and to individual mutable >> > objects, and actually got the whole thing working. Unfortunately even >> > on the system with the fastest locking primitives (Windows at the >> > time) they measured a 2x slow-down on a single CPU due to all the >> > extra locking operations going on. >> >> That just breaks my heart. >> >> You gotta finish that sentence, it was a slow down on single CPU with >> a speed increase with two or more CPUs, leveling out at 4 CPUs or so. >> >> This was the same situation on every major OS kernel, including AIX, >> HPUX, Linux, Tru64, etc., when they started supporting SMP machines, >> which is why all of them at some time sported two kernels, one for SMP >> machines with the spinlock code and one for single processor machines >> with the spinlock code #ifdef'ed out. For some, like IBM/AIX and >> HPUX, eventually and as expected, all their servers became MPs and >> then they stopped delivering the SP kernel. >> >> The same would've been true for the python interpreter, one for MP and >> one for SP, and eventually, even in the PC world, everything would be >> MP and the SP interpreter would disappear. > > The difference is, for an OS kernel, there really isn't any other way > to benefit from multiple CPUs. But for Python, there is -- run > multiple processes instead of threads! Wouldn't multiple interpreters (assuming the problems with them would be fixed) in the same process give the same benefit? A separate GIL for each one? Thomas From guido at python.org Tue May 8 19:09:46 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 8 May 2007 10:09:46 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: References: Message-ID: On 5/8/07, Thomas Heller wrote: > Wouldn't multiple interpreters (assuming the problems with them would be fixed) > in the same process give the same benefit? A separate GIL for each one? No; numerous read-only and immutable objects (e.g. the small integers, 1-character strings, the empty tuple; and all built-in type objects) are shared between all interpreters. Also, extensions can easily share state between interpreters I believe. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Tue May 8 19:30:09 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Tue, 8 May 2007 10:30:09 -0700 Subject: [Python-3000] failing tests In-Reply-To: References: Message-ID: One more test is failing: test test_fileio failed -- Traceback (most recent call last): File "/tmp/python-test-3.0/local/lib/python3.0/test/test_fileio.py", line 128, in testAbles f = _fileio._FileIO("/dev/tty", "a") IOError: [Errno 6] No such device or address: '/dev/tty' This seems to only happen when there is no tty associated with a terminal which happens when run from cron (among other situations). n -- On 5/7/07, Neal Norwitz wrote: > There are 3* failing tests: > test_compiler test_doctest test_transformer > * plus a few more when running on a 64-bit platform > > These failures occurred before and after xrange checkin. > > Do other people see these failures? Any ideas when they started? > > The doctest failures are due to no space at the end of the line (print > behavior change). Not sure what to do about that now that we prevent > blanks at the end of lines from being checked in. :-) > > n > From guido at python.org Tue May 8 19:38:05 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 8 May 2007 10:38:05 -0700 Subject: [Python-3000] failing tests In-Reply-To: References: Message-ID: Should be fixed now. Committed revision 55186. On 5/8/07, Neal Norwitz wrote: > One more test is failing: > > test test_fileio failed -- Traceback (most recent call last): > File "/tmp/python-test-3.0/local/lib/python3.0/test/test_fileio.py", > line 128, in testAbles > f = _fileio._FileIO("/dev/tty", "a") > IOError: [Errno 6] No such device or address: '/dev/tty' > > This seems to only happen when there is no tty associated with a > terminal which happens when run from cron (among other situations). > > n > -- > > On 5/7/07, Neal Norwitz wrote: > > There are 3* failing tests: > > test_compiler test_doctest test_transformer > > * plus a few more when running on a 64-bit platform > > > > These failures occurred before and after xrange checkin. > > > > Do other people see these failures? Any ideas when they started? > > > > The doctest failures are due to no space at the end of the line (print > > behavior change). Not sure what to do about that now that we prevent > > blanks at the end of lines from being checked in. :-) > > > > n > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Tue May 8 19:49:31 2007 From: brett at python.org (Brett Cannon) Date: Tue, 8 May 2007 10:49:31 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: References: Message-ID: On 5/8/07, Guido van Rossum wrote: > > On 5/8/07, Thomas Heller wrote: > > Wouldn't multiple interpreters (assuming the problems with them would be > fixed) > > in the same process give the same benefit? A separate GIL for each one? > > No; numerous read-only and immutable objects (e.g. the small integers, > 1-character strings, the empty tuple; and all built-in type objects) > are shared between all interpreters. Also, extensions can easily share > state between interpreters I believe. All extensions share their state between interpreters. The import machinery literally caches the module dict for an extension and uses that to reinitialize any new instances. But Martin's PEP on module init helps to deal with this issue. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070508/b5fe92bf/attachment-0001.html From jimjjewett at gmail.com Tue May 8 20:42:57 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 8 May 2007 14:42:57 -0400 Subject: [Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py )) In-Reply-To: References: <20070505150035.GA16303@panix.com> <200705051334.45120.fdrake@acm.org> <20070505124008.648D.JCARLSON@uci.edu> Message-ID: On 5/8/07, Guido van Rossum wrote: > On 5/8/07, Jim Jewett wrote: > > I will be unhappy if 2to3 produces code that I can't run in (at least) > > 2.6, because then I would need to convert more than once. > This is the first time I hear of this requirement. It has not so far > been a design goal for the conversions in 2to3. The workflow that I > have in mind (and that others have agreed to be workable) is more like > this: > 1. develop working code under 2.6 > 2. make sure it is warning-free with the special -Wpy3k option > 3. use 2to3 to convert it to 3.0 compatible syntax in a temporary directory > 4. run your unit test suite with 3.0 > 5. for any defects you find, EDIT THE 2.6 SOURCE AND GO BACK TO STEP 2 The problem is what to do after step 5 ... Do you leave your 3 code in the awkward auto-generated format, and suggest (by example) that py3 code is clunky? Do you immediately stop supporting 2.x? Or do you fork the code? -jJ From guido at python.org Tue May 8 20:55:31 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 8 May 2007 11:55:31 -0700 Subject: [Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py )) In-Reply-To: References: <20070505150035.GA16303@panix.com> <200705051334.45120.fdrake@acm.org> <20070505124008.648D.JCARLSON@uci.edu> Message-ID: On 5/8/07, Jim Jewett wrote: > On 5/8/07, Guido van Rossum wrote: > > On 5/8/07, Jim Jewett wrote: > > > I will be unhappy if 2to3 produces code that I can't run in (at least) > > > 2.6, because then I would need to convert more than once. > > > This is the first time I hear of this requirement. It has not so far > > been a design goal for the conversions in 2to3. The workflow that I > > have in mind (and that others have agreed to be workable) is more like > > this: > > > 1. develop working code under 2.6 > > 2. make sure it is warning-free with the special -Wpy3k option > > 3. use 2to3 to convert it to 3.0 compatible syntax in a temporary directory > > 4. run your unit test suite with 3.0 > > 5. for any defects you find, EDIT THE 2.6 SOURCE AND GO BACK TO STEP 2 > > The problem is what to do after step 5 ... > > Do you leave your 3 code in the awkward auto-generated format, and > suggest (by example) that py3 code is clunky? > > Do you immediately stop supporting 2.x? > > Or do you fork the code? As long as you have to support 2.6, you keep developing for 2.6 and cut distros from the converted code after they pass step 5. Once you are comfortable with dropping support for 2.6 (or when 2.6 support can be relegated to a maintenance branch) you can start developing using the converted code. I disagree that the converted code is awkward. Have you even tried the 2to3 tool yet? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jackdied at jackdied.com Tue May 8 20:56:39 2007 From: jackdied at jackdied.com (Jack Diederich) Date: Tue, 8 May 2007 14:56:39 -0400 Subject: [Python-3000] PEP 3129: Class Decorators In-Reply-To: References: <43aa6ff70705071008q6a33e00eq7e5073dba5fa07e@mail.gmail.com> Message-ID: <20070508185639.GA5429@performancedrivers.com> On Mon, May 07, 2007 at 11:12:40AM -0700, Guido van Rossum wrote: > On 5/7/07, Collin Winter wrote: > > Can I go ahead and mark PEP 3129 as "accepted"? > > Almost. I'm ok with it, but I think that to follow the procedure you > ought to post the full text at least once on python-3000, so you can > add the date to the "Post-History" header. In the mean time, I think > it would be fine to start on the implementation! > My implementation worked as of PyCon but has some conflicts with stuff that has been checked in since. I will have time next week to get it working on the current 3k branch. -Jack From guido at python.org Tue May 8 21:12:36 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 8 May 2007 12:12:36 -0700 Subject: [Python-3000] PEP 3129: Class Decorators In-Reply-To: <20070508185639.GA5429@performancedrivers.com> References: <43aa6ff70705071008q6a33e00eq7e5073dba5fa07e@mail.gmail.com> <20070508185639.GA5429@performancedrivers.com> Message-ID: Cool! Looking forward to it. Collin or someone else can help you get it checked in if you don't have dev privs yet. Given the lack of discussion following the posting of the PEP, let's accept it. On 5/8/07, Jack Diederich wrote: > On Mon, May 07, 2007 at 11:12:40AM -0700, Guido van Rossum wrote: > > On 5/7/07, Collin Winter wrote: > > > Can I go ahead and mark PEP 3129 as "accepted"? > > > > Almost. I'm ok with it, but I think that to follow the procedure you > > ought to post the full text at least once on python-3000, so you can > > add the date to the "Post-History" header. In the mean time, I think > > it would be fine to start on the implementation! > > > > My implementation worked as of PyCon but has some conflicts with > stuff that has been checked in since. I will have time next week > to get it working on the current 3k branch. > > -Jack > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Tue May 8 21:25:15 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 08 May 2007 21:25:15 +0200 Subject: [Python-3000] the future of the GIL In-Reply-To: References: Message-ID: <4640CE9B.9010907@v.loewis.de> > Wouldn't multiple interpreters (assuming the problems with them would be fixed) > in the same process give the same benefit? A separate GIL for each one? No. There is a global "current thread" variable that is protected by the GIL (namely, _PyThreadState_Current). Without that, you would not even know what the current interpreter is, so fixing all the other problems with multiple interpreters won't help. You could try to save the current thread reference into TLS, but, depending on the platform, that may be expensive to access. The "right" way would be to pass the current interpreter to all API functions, the way Tcl does it. Indeed, Tcl's threading model is that you have one interpreter per thread, and don't need any locking at all (but you can't have multi-threaded Tcl scripts under that model). However, even if you give multiple interpreters separate GILs, you still won't see a speed-up on a multi-processor system if you have a multi-threaded Python script: once one thread blocks on that interpreter's GIL, that thread is also "wasted" for all other interpreters, since the thread is hanging waiting for the GIL. To fix that, you would also have to use separate threads for the separate interpreters. When you do so, you might just as well start separate OS processes. Regards, Martin From collinw at gmail.com Tue May 8 21:34:52 2007 From: collinw at gmail.com (Collin Winter) Date: Tue, 8 May 2007 12:34:52 -0700 Subject: [Python-3000] PEP 3129: Class Decorators In-Reply-To: References: <43aa6ff70705071008q6a33e00eq7e5073dba5fa07e@mail.gmail.com> <20070508185639.GA5429@performancedrivers.com> Message-ID: <43aa6ff70705081234j2691202cj7c871c7c1b20f02d@mail.gmail.com> On 5/8/07, Guido van Rossum wrote: > Given the lack of discussion following the posting of the PEP, let's accept it. Marked as accepted in r55190. Collin Winter > On 5/8/07, Jack Diederich wrote: > > On Mon, May 07, 2007 at 11:12:40AM -0700, Guido van Rossum wrote: > > > On 5/7/07, Collin Winter wrote: > > > > Can I go ahead and mark PEP 3129 as "accepted"? > > > > > > Almost. I'm ok with it, but I think that to follow the procedure you > > > ought to post the full text at least once on python-3000, so you can > > > add the date to the "Post-History" header. In the mean time, I think > > > it would be fine to start on the implementation! > > > > > > > My implementation worked as of PyCon but has some conflicts with > > stuff that has been checked in since. I will have time next week > > to get it working on the current 3k branch. > > > > -Jack > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/collinw%40gmail.com > From eucci.group at gmail.com Tue May 8 23:52:48 2007 From: eucci.group at gmail.com (Jeff Shell) Date: Tue, 8 May 2007 15:52:48 -0600 Subject: [Python-3000] ABC's, Roles, etc Message-ID: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> Hello. I just joined the list as the whole Abstract Base Class, Interfaces, and Roles/Traits system is of significant interest to me. I've tried to catch up on the discussion by reading through the archives, but I'm sure I missed a few posts and I apologize if I'm wasting time covering ground that's already been covered. I have a lengthy post that dissects a major issue that I have with ABCs and the Interface definition that I saw in PEP 3124:: it all seems rigidly class and class-instance based. The cardinal sin I saw in the Interface definition in PEP 3124 (at least, at the time I last viewed it) was the inclusion of 'self' in a method spec. It seems to me that Abstract Base Classes and even PEP 3124 are primarily focused on classes. But in Python, "everything is an object", but not everything is class-based. Jim Fulton taught me a long time ago that there are numerous ways to fulfill a role, or provide an interface. 'self' is an internal detail of class-instance implementations. In my post, I show some (stupid) implementations of the 'IStack' interface seen in PEP 3124, only one of which is the traditional class - instance based style. http://griddlenoise.blogspot.com/2007/05/abc-may-be-easy-as-123-but-it-cant-beat.html The rest of this post focuses on what `zope.interface` already provides - a system for specifying behavior and declaring support at both the class and object level - and 'object' really means 'object', which includes modules. You're more than welcome to tune out now. My main focus is on determining what Abstract Base Classes and/or PEP 3124's Interfaces do better than `zope.interface` (if anyone else is familiar with that package). I've found great success using `zope.interface` to satisfy many of the requirements and issues that these systems may try to solve, and more. In fact, `zope.interface` is closer to Roles/Traits than anything else. # ..... I wanted to chime in here and say that `zope.interface` (from Zope 3, but available separately) is an existing implementation that comes quite close to what Collin Winter proposed. Even in some of its spellings. http://cheeseshop.python.org/pypi/zope.interface/3.3.0.1 The main thing is that `zope.interface` focuses declaration on the object - NOT the class. You do not use `self` in interface specifications. Terms I've grown fond of while using `zope.interface` are "specifies", "provides", and "implements". An Interface **specifies** desired *object behavior* - basically it's the API:: class IAuthVerification(Interface): def verify(invoice_number, amount): """ Returns an IAuthResult containing status information about success or failure. """ An *object* **provides** that behavior:: >>> IAuthVerification.providedBy(authorizer) True >>> result = authorizer.verify(invoice_number='KB125', amount=43.40) Now, a class may **implement** that behavior, which is a way of saying that "instances of this class will provide the behavior": class AuthNet(object): def verify(self, invoice_number, amount): """ ... (class - instance based implementation) """ classImplements(AuthNet, IAuthVerification) >>> IAuthVerification.providedBy(AuthNet) False >>> AuthNet.verify(invoice_number='KB125', amount=43.40) Alternatively, class or static methods could be used: class StaticAuthNet(object): @staticmethod def verify(invoice_number, amount): """ ... """ alsoProvides(StaticAuthNet, IAuthVerification) >>> IAuthVerification.providedBy(StaticAuthNet) True >>> result = StaticAuthNet.verify(invoice_number='KB125', amount=43.40) Or a module could even provide the interfaces. In the first example above (under 'an object **provides** that behavior'), do you know whether 'authorizer' is an instance, class, or module? Hell, maybe it's a function that has 'verify' added as an attribute. It doesn't matter - it fills the 'IAuthVerification' role. In my blog post, I also show a dynamically constructed object providing an interface's specified behavior. An instance of an empty class is made, and then methods and other supporting attributes are attached to this specific instance only. Real world examples of this include Zope 2, where a folder may have "Python Scripts" or other callable members that, in effect, make for a totally custom object. It can also provide this same behavior (in fact, I was able to take advantage of this on some old old old Zope 2 projects that started in the web environment and transitioned to regular Python modules/classes). In any case, there are numerous ways to fulfill a role. I think any system that was limited to classes and involved 'issubclass' and 'isinstance' trickery would be limiting or confusing if it started to be used to describe behaviors of modules, one-off objects, and so on. -- Jeff Shell From greg.ewing at canterbury.ac.nz Wed May 9 02:46:14 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 09 May 2007 12:46:14 +1200 Subject: [Python-3000] the future of the GIL In-Reply-To: References: Message-ID: <464119D6.6060303@canterbury.ac.nz> Luis P Caamano wrote: > You gotta finish that sentence, it was a slow down on single CPU with > a speed increase with two or more CPUs, leveling out at 4 CPUs or so. But it's still going to slow down all code that doesn't use threads. I don't want to be *forced* to use threads to get decent speed from my programs! -- Greg From exarkun at divmod.com Wed May 9 03:43:49 2007 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Tue, 8 May 2007 21:43:49 -0400 Subject: [Python-3000] the future of the GIL In-Reply-To: <464119D6.6060303@canterbury.ac.nz> Message-ID: <20070509014349.19381.246190631.divmod.quotient.10259@ohm> On Wed, 09 May 2007 12:46:14 +1200, Greg Ewing wrote: >Luis P Caamano wrote: > >> You gotta finish that sentence, it was a slow down on single CPU with >> a speed increase with two or more CPUs, leveling out at 4 CPUs or so. > >But it's still going to slow down all code that >doesn't use threads. I don't want to be *forced* >to use threads to get decent speed from my programs! > It would also make Python applications a much greater drag on the system as a whole, as they would need to use four whole CPUs to just break even on a multithreaded compute-intensive task. Even if this can be improved over time (which it probably can be, to some extent, given sufficient effort), one might want to consider the consequences of having any widespread usage of such a resource- intensive Python interpreter on general perception of Python as a language. Having a GIL-free build of CPython alongside a GIL-having build does something to alleviate this, but it's not immediately what the development maintenance burden of this would be, nor whether the resulting user-experience would be desirable (which will be packaged, what would become of "#!/usr/bin/env python", etc). It may be doable, but it doesn't strike me as an obviously good idea. Jean-Paul From pje at telecommunity.com Wed May 9 03:57:37 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 08 May 2007 21:57:37 -0400 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.co m> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> Message-ID: <20070509015553.9C6843A4061@sparrow.telecommunity.com> At 03:52 PM 5/8/2007 -0600, Jeff Shell wrote: >I have a lengthy post that dissects a major issue that I have with >ABCs and the Interface definition that I saw in PEP 3124:: it all >seems rigidly class and class-instance based. Hi Jeff; I read your post a few days ago, but your blog doesn't support comments, so I've been "getting around to" writing a counterpoint on my blog. But, now I can do it here instead. :) >The cardinal sin I saw >in the Interface definition in PEP 3124 (at least, at the time I last >viewed it) was the inclusion of 'self' in a method spec. That's because you're confusing a generic function and a "method spec". PEP 3124 interfaces are not *specifications*; they're namespaces for generic functions. They are much closer in nature to ABCs than they are zope.interface-style Interfaces. The principal thing they have in common with zope.interface (aside from the name) is the support for "IFoo(ob)"-style adaptation. Very few of zope.interface's design goals are shared by PEP 3124. Notably, they are not particularly good for type checking or verification. In PEP 3124, for example, IFoo(ob) *always* returns an object with the specified attributes; the only way to know whether they are actually implemented is to try using them. Notice that this is diametrically opposed to what zope.interface wants to do in such situations -- which is why the PEP makes such a big deal about it being possible to use zope.interface *instead*. That is, my own observation is that different frameworks sometimes need different kinds of interfaces. For example, someone might create a framework that wants to verify preconditions and postconditions of methods in an interface, rather than merely specifying their names and arguments! Using zope.interface as an exclusive basis for interface definition and type annotations would block innovation in this area. PEP 3124 interfaces are therefore explicitly intended to be merely one *possible* kind of interface, rather than a be-all end-all interface system. They have many differences from zope.interface, which, depending on your goals, may be a plus or minus. But you certainly aren't obligated to *use* them. PEP 3124 merely proposes a framework for how to use interfaces for method overloading, generic functions, and AOP. From the PEP itself: """For example, it should be possible to use a ``zope.interface`` interface object to specify the desired type of a function argument, as long as the ``zope.interface`` package registered itself correctly (or a third party did the registration). In this way, the proposed API simply offers a uniform way of accessing the functionality within its scope, rather than prescribing a single implementation to be used for all libraries, frameworks, and applications.""" >... 'self' is an internal detail of class-instance implementations. Again - this is because you're assuming the purpose of a PEP 3124 interface is to *specify* an interface, when in fact it's much more like an ABC, which may also *implement* the interface. The specification and implementation are intentionally unified here. Of course, again, you will be able to use zope.interfaces as argument annotations to @overload-ed functions and methods, which is the point of the PEP. Its "Interface" class is merely a suggested default, leaving the fancier tricks to established packages, in the same way that its generic function implementation will not do everything that RuleDispatch or PEAK-Rules can do. It's supposed to be a core framework for such add-on packages, not a replacement for them. >It seems to me that Abstract Base Classes and even PEP 3124 are >primarily focused on classes. But in Python, "everything is an >object", but not everything is class-based. >... >The rest of this post focuses on what `zope.interface` already >provides - a system for specifying behavior and declaring support at >both the class and object level - and 'object' really means 'object', >which includes modules. Right -- and *neither* "specifying behavior" nor "declaring support" are goals of PEP 3124; they're entirely out of its scope. The Interface object's purpose is to support uniform access to, and implementation of, individual operations. These are somewhat parallel concepts, but very different in thrust; zope.interface is LBYL, while PEP 3124 is EAFP all the way. As a result, PEP 3124 chooses to punt on the issue of individual objects. It's quite possible within the framework to allow instance-level checks during dispatching, but it's not going to be in the default engine (which is based on type-tuple dispatching; see Guido's Py3K overloading prototype). zope.interface (and zope.component, IIRC) pay a high price in complexity for allowing interfaces to be per-instance rather than type-defined. I implemented the same feature in PyProtocols, but over the years I rarely found it useful. My understanding of its usefulness in Zope is that it: 1. supports specification and testing of module-level Zope APIs 2. allows views and other wrapping operations to be selected on a dynamic basis Since #1 falls outside of PEP 3124's goals (i.e., it's not about specification or testing), that leaves use case #2. In my experience, it has been more than sufficient to simply give these object some *other* interface, such as an IViewTags interface with a method to query these dynamic "tag" interfaces. In other words, my experience and opinion supports the view that use case #2 is actually a coincidental abuse of interfaces for convenience, rather than the "one obvious way" to handle the use case. To put it another way, if you define getView() as a generic function, you can always define a dynamic implementation of it for any type that you wish to have dynamic view selection capability. Then, only those cases that require a complex solution have to pay for the complexity. So, that's my rationale for why PEP 3124 doesn't provide any instance-based features out of the box; outside of API specs for singleton objects, the need for them is mostly an illusion created by Zope 3's dynamic view selection hack. >My main focus is on determining what Abstract Base Classes and/or PEP >3124's Interfaces do better than `zope.interface` (if anyone else is >familiar with that package). I at least am quite familiar with it, having helped to define some of its terminology and API, as well as being the original author of its class-decorator emulation for Python versions 2.2 and up. :) I also argued for its adoption of PEP 246, and wrote PyProtocols to unify Twisted and Zope interfaces in a PEP 246-based adaptation framework. And what PEP 3124 does much better than zope.interface or even PyProtocols is: 1. Adaptation, especially incomplete adaptation. You can implement only the methods that are actually needed for your use case. If the interface includes generic implementations that are defined in terms of other methods in the interface, you need not reimplement them. (Note: I'm well aware that here my definition of "better" would be considered "worse" by Jim Fulton, since zope.interface is LBYL-oriented. However, for many users and use cases, EAFP *is* better, even if it's not for Zope.) 2. Interface recombination. AFAIK, zope.interface doesn't support subset interfaces like PyProtocols does. Neither zope.interface nor PyProtocols support method renaming, where two interfaces have a method with the same specification but different method names. 3. Low mental overhead. PEP 3124 doesn't even *need* interfaces; simple use cases can just use overloaded functions and be on about their business. Use cases that would require an interface and half a dozen adapter classes in zope.interface can be met by simply creating an overloaded function and adding methods. And the resulting code reads like code in other languages that support overloading or generic functions, rather than reading like Java. >In my blog post, I also show a dynamically constructed object >providing an interface's specified behavior. An instance of an empty >class is made, and then methods and other supporting attributes are >attached to this specific instance only. Real world examples of this >include Zope 2, where a folder may have "Python Scripts" or other >callable members that, in effect, make for a totally custom object. It >can also provide this same behavior (in fact, I was able to take >advantage of this on some old old old Zope 2 projects that started in >the web environment and transitioned to regular Python >modules/classes). And how often does this happen outside of Zope? As I said, I rarely found it to be the case anywhere else. I replicated the ability in PyProtocols because I was biased by my prior Zope experience, but once I got outside of Zope it almost entirely ceased to be useful. Meanwhile, as I said, PEP 3124 is not closed to extension. It's specifically intended that zope.interface (and any other interface packages that might arise in future) should be able to play as first-class citizens in the proposed API. However, depending on the specific features desired, those packages might have some additional integration work to do. (Note, by the way, that zope.interface is explicitly mentioned three times in the PEP, as an example of how other interface types should be able to be used for overloading, as long as they register appropriate methods with the provided framework.) From monpublic at gmail.com Wed May 9 03:58:13 2007 From: monpublic at gmail.com (Chris Monson) Date: Tue, 8 May 2007 21:58:13 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: On 4/30/07, Phillip J. Eby wrote: > > This is just the first draft (also checked into SVN), and doesn't include > the details of how the extension API works (so that third-party interfaces > and generic functions can interoperate using the same decorators, > annotations, etc.). > > Comments and questions appreciated, as it'll help drive better > explanations > of both the design and rationales. I'm usually not that good at guessing > what other people will want to know (or are likely to misunderstand) until > I get actual questions. > > > PEP: 3124 > Title: Overloading, Generic Functions, Interfaces, and Adaptation > Version: $Revision: 55029 $ > Last-Modified: $Date: 2007-04-30 18:48:06 -0400 (Mon, 30 Apr 2007) $ > Author: Phillip J. Eby > Discussions-To: Python 3000 List > Status: Draft > Type: Standards Track > Requires: 3107, 3115, 3119 > Replaces: 245, 246 > Content-Type: text/x-rst > Created: 28-Apr-2007 > Post-History: 30-Apr-2007 [snip] > > "Before" and "After" Methods > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > In addition to the simple next-method chaining shown above, it is > sometimes useful to have other ways of combining methods. For > example, the "observer pattern" can sometimes be implemented by adding > extra methods to a function, that execute before or after the normal > implementation. > > To support these use cases, the ``overloading`` module will supply > ``@before``, ``@after``, and ``@around`` decorators, that roughly > correspond to the same types of methods in the Common Lisp Object > System (CLOS), or the corresponding "advice" types in AspectJ. > > Like ``@when``, all of these decorators must be passed the function to > be overloaded, and can optionally accept a predicate as well:: > > def begin_transaction(db): > print "Beginning the actual transaction" > > > @before(begin_transaction) > def check_single_access(db: SingletonDB): > if db.inuse: > raise TransactionError("Database already in use") > > @after(begin_transaction) > def start_logging(db: LoggableDB): > db.set_log_level(VERBOSE) If we are looking at doing Design By Contract using @before and @after (preconditions and postconditions), shouldn't there be some way of getting at the return value in functions decorated with @after? For example, it seems reasonable to require an extra argument, perhaps at the beginning: def successor(num): return num + 1 @before(successor) def check_positive(num: int): if num < 0: raise PreconditionError("Positive integer inputs required") @after(successor) def check_successor(returned, num:int): if returned != num + 1: raise PostconditionError("successor failed to do its job") Or am I missing something about how @after works? +1, BTW, on this whole idea. - C ``@before`` and ``@after`` methods are invoked either before or after > the main function body, and are *never considered ambiguous*. That > is, it will not cause any errors to have multiple "before" or "after" > methods with identical or overlapping signatures. Ambiguities are > resolved using the order in which the methods were added to the > target function. > > "Before" methods are invoked most-specific method first, with > ambiguous methods being executed in the order they were added. All > "before" methods are called before any of the function's "primary" > methods (i.e. normal ``@overload`` methods) are executed. > > "After" methods are invoked in the *reverse* order, after all of the > function's "primary" methods are executed. That is, they are executed > least-specific methods first, with ambiguous methods being executed in > the reverse of the order in which they were added. > > The return values of both "before" and "after" methods are ignored, > and any uncaught exceptions raised by *any* methods (primary or other) > immediately end the dispatching process. "Before" and "after" methods > cannot have ``__proceed__`` arguments, as they are not responsible > for calling any other methods. They are simply called as a > notification before or after the primary methods. > > Thus, "before" and "after" methods can be used to check or establish > preconditions (e.g. by raising an error if the conditions aren't met) > or to ensure postconditions, without needing to duplicate any existing > functionality. > > > "Around" Methods > ~~~~~~~~~~~~~~~~ > > The ``@around`` decorator declares a method as an "around" method. > "Around" methods are much like primary methods, except that the > least-specific "around" method has higher precedence than the > most-specific "before" or method. > > Unlike "before" and "after" methods, however, "Around" methods *are* > responsible for calling their ``__proceed__`` argument, in order to > continue the invocation process. "Around" methods are usually used > to transform input arguments or return values, or to wrap specific > cases with special error handling or try/finally conditions, e.g.:: > > @around(commit_transaction) > def lock_while_committing(__proceed__, db: SingletonDB): > with db.global_lock: > return __proceed__(db) > > They can also be used to replace the normal handling for a specific > case, by *not* invoking the ``__proceed__`` function. > > The ``__proceed__`` given to an "around" method will either be the > next applicable "around" method, a ``DispatchError`` instance, > or a synthetic method object that will call all the "before" methods, > followed by the primary method chain, followed by all the "after" > methods, and return the result from the primary method chain. > > Thus, just as with normal methods, ``__proceed__`` can be checked for > ``DispatchError``-ness, or simply invoked. The "around" method should > return the value returned by ``__proceed__``, unless of course it > wishes to modify or replace it with a different return value for the > function as a whole. > > > Custom Combinations > ~~~~~~~~~~~~~~~~~~~ > > The decorators described above (``@overload``, ``@when``, ``@before``, > ``@after``, and ``@around``) collectively implement what in CLOS is > called the "standard method combination" -- the most common patterns > used in combining methods. > > Sometimes, however, an application or library may have use for a more > sophisticated type of method combination. For example, if you > would like to have "discount" methods that return a percentage off, > to be subtracted from the value returned by the primary method(s), > you might write something like this:: > > from overloading import always_overrides, merge_by_default > from overloading import Around, Before, After, Method, MethodList > > class Discount(MethodList): > """Apply return values as discounts""" > > def __call__(self, *args, **kw): > retval = self.tail(*args, **kw) > for sig, body in self.sorted(): > retval -= retval * body(*args, **kw) > return retval > > # merge discounts by priority > merge_by_default(Discount) > > # discounts have precedence over before/after/primary methods > always_overrides(Discount, Before) > always_overrides(Discount, After) > always_overrides(Discount, Method) > > # but not over "around" methods > always_overrides(Around, Discount) > > # Make a decorator called "discount" that works just like the > # standard decorators... > discount = Discount.make_decorator('discount') > > # and now let's use it... > def price(product): > return product.list_price > > @discount(price) > def ten_percent_off_shoes(product: Shoe) > return Decimal('0.1') > > Similar techniques can be used to implement a wide variety of > CLOS-style method qualifiers and combination rules. The process of > creating custom method combination objects and their corresponding > decorators is described in more detail under the `Extension API`_ > section. > > Note, by the way, that the ``@discount`` decorator shown will work > correctly with any new predicates defined by other code. For example, > if ``zope.interface`` were to register its interface types to work > correctly as argument annotations, you would be able to specify > discounts on the basis of its interface types, not just classes or > ``overloading``-defined interface types. > > Similarly, if a library like RuleDispatch or PEAK-Rules were to > register an appropriate predicate implementation and dispatch engine, > one would then be able to use those predicates for discounts as well, > e.g.:: > > from somewhere import Pred # some predicate implementation > > @discount( > price, > Pred("isinstance(product,Shoe) and" > " product.material.name=='Blue Suede'") > ) > def forty_off_blue_suede_shoes(product): > return Decimal('0.4') > > The process of defining custom predicate types and dispatching engines > is also described in more detail under the `Extension API`_ section. > > > Overloading Inside Classes > -------------------------- > > All of the decorators above have a special additional behavior when > they are directly invoked within a class body: the first parameter > (other than ``__proceed__``, if present) of the decorated function > will be treated as though it had an annotation equal to the class > in which it was defined. > > That is, this code:: > > class And(object): > # ... > @when(get_conjuncts) > def __conjuncts(self): > return self.conjuncts > > produces the same effect as this (apart from the existence of a > private method):: > > class And(object): > # ... > > @when(get_conjuncts) > def get_conjuncts_of_and(ob: And): > return ob.conjuncts > > This behavior is both a convenience enhancement when defining lots of > methods, and a requirement for safely distinguishing multi-argument > overloads in subclasses. Consider, for example, the following code:: > > class A(object): > def foo(self, ob): > print "got an object" > > @overload > def foo(__proceed__, self, ob:Iterable): > print "it's iterable!" > return __proceed__(self, ob) > > > class B(A): > foo = A.foo # foo must be defined in local namespace > > @overload > def foo(__proceed__, self, ob:Iterable): > print "B got an iterable!" > return __proceed__(self, ob) > > Due to the implicit class rule, calling ``B().foo([])`` will print > "B got an iterable!" followed by "it's iterable!", and finally, > "got an object", while ``A().foo([])`` would print only the messages > defined in ``A``. > > Conversely, without the implicit class rule, the two "Iterable" > methods would have the exact same applicability conditions, so calling > either ``A().foo([])`` or ``B().foo([])`` would result in an > ``AmbiguousMethods`` error. > > It is currently an open issue to determine the best way to implement > this rule in Python 3.0. Under Python 2.x, a class' metaclass was > not chosen until the end of the class body, which means that > decorators could insert a custom metaclass to do processing of this > sort. (This is how RuleDispatch, for example, implements the implicit > class rule.) > > PEP 3115, however, requires that a class' metaclass be determined > *before* the class body has executed, making it impossible to use this > technique for class decoration any more. > > At this writing, discussion on this issue is ongoing. > > > Interfaces and Adaptation > ------------------------- > > The ``overloading`` module provides a simple implementation of > interfaces and adaptation. The following example defines an > ``IStack`` interface, and declares that ``list`` objects support it:: > > from overloading import abstract, Interface > > class IStack(Interface): > @abstract > def push(self, ob) > """Push 'ob' onto the stack""" > > @abstract > def pop(self): > """Pop a value and return it""" > > > when(IStack.push, (list, object))(list.append) > when(IStack.pop, (list,))(list.pop) > > mylist = [] > mystack = IStack(mylist) > mystack.push(42) > assert mystack.pop()==42 > > The ``Interface`` class is a kind of "universal adapter". It accepts > a single argument: an object to adapt. It then binds all its methods > to the target object, in place of itself. Thus, calling > ``mystack.push(42``) is the same as calling > ``IStack.push(mylist, 42)``. > > The ``@abstract`` decorator marks a function as being abstract: i.e., > having no implementation. If an ``@abstract`` function is called, > it raises ``NoApplicableMethods``. To become executable, overloaded > methods must be added using the techniques previously described. (That > is, methods can be added using ``@when``, ``@before``, ``@after``, > ``@around``, or any custom method combination decorators.) > > In the example above, the ``list.append`` method is added as a method > for ``IStack.push()`` when its arguments are a list and an arbitrary > object. Thus, ``IStack.push(mylist, 42)`` is translated to > ``list.append(mylist, 42)``, thereby implementing the desired > operation. > > (Note: the ``@abstract`` decorator is not limited to use in interface > definitions; it can be used anywhere that you wish to create an > "empty" generic function that initially has no methods. In > particular, it need not be used inside a class.) > > > Subclassing and Re-assembly > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > Interfaces can be subclassed:: > > class ISizedStack(IStack): > @abstract > def __len__(self): > """Return the number of items on the stack""" > > # define __len__ support for ISizedStack > when(ISizedStack.__len__, (list,))(list.__len__) > > Or assembled by combining functions from existing interfaces:: > > class Sizable(Interface): > __len__ = ISizedStack.__len__ > > # list now implements Sizable as well as ISizedStack, without > # making any new declarations! > > A class can be considered to "adapt to" an interface at a given > point in time, if no method defined in the interface is guaranteed to > raise a ``NoApplicableMethods`` error if invoked on an instance of > that class at that point in time. > > In normal usage, however, it is "easier to ask forgiveness than > permission". That is, it is easier to simply use an interface on > an object by adapting it to the interface (e.g. ``IStack(mylist)``) > or invoking interface methods directly (e.g. ``IStack.push(mylist, > 42)``), than to try to figure out whether the object is adaptable to > (or directly implements) the interface. > > > Implementing an Interface in a Class > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > It is possible to declare that a class directly implements an > interface, using the ``declare_implementation()`` function:: > > from overloading import declare_implementation > > class Stack(object): > def __init__(self): > self.data = [] > def push(self, ob): > self.data.append(ob) > def pop(self): > return self.data.pop() > > declare_implementation(IStack, Stack) > > The ``declare_implementation()`` call above is roughly equivalent to > the following steps:: > > when(IStack.push, (Stack,object))(lambda self, ob: self.push(ob)) > when(IStack.pop, (Stack,))(lambda self, ob: self.pop()) > > That is, calling ``IStack.push()`` or ``IStack.pop()`` on an instance > of any subclass of ``Stack``, will simply delegate to the actual > ``push()`` or ``pop()`` methods thereof. > > For the sake of efficiency, calling ``IStack(s)`` where ``s`` is an > instance of ``Stack``, **may** return ``s`` rather than an ``IStack`` > adapter. (Note that calling ``IStack(x)`` where ``x`` is already an > ``IStack`` adapter will always return ``x`` unchanged; this is an > additional optimization allowed in cases where the adaptee is known > to *directly* implement the interface, without adaptation.) > > For convenience, it may be useful to declare implementations in the > class header, e.g.:: > > class Stack(metaclass=Implementer, implements=IStack): > ... > > Instead of calling ``declare_implementation()`` after the end of the > suite. > > > Interfaces as Type Specifiers > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > ``Interface`` subclasses can be used as argument annotations to > indicate what type of objects are acceptable to an overload, e.g.:: > > @overload > def traverse(g: IGraph, s: IStack): > g = IGraph(g) > s = IStack(s) > # etc.... > > Note, however, that the actual arguments are *not* changed or adapted > in any way by the mere use of an interface as a type specifier. You > must explicitly cast the objects to the appropriate interface, as > shown above. > > Note, however, that other patterns of interface use are possible. > For example, other interface implementations might not support > adaptation, or might require that function arguments already be > adapted to the specified interface. So the exact semantics of using > an interface as a type specifier are dependent on the interface > objects you actually use. > > For the interface objects defined by this PEP, however, the semantics > are as described above. An interface I1 is considered "more specific" > than another interface I2, if the set of descriptors in I1's > inheritance hierarchy are a proper superset of the descriptors in I2's > inheritance hierarchy. > > So, for example, ``ISizedStack`` is more specific than both > ``ISizable`` and ``ISizedStack``, irrespective of the inheritance > relationships between these interfaces. It is purely a question of > what operations are included within those interfaces -- and the > *names* of the operations are unimportant. > > Interfaces (at least the ones provided by ``overloading``) are always > considered less-specific than concrete classes. Other interface > implementations can decide on their own specificity rules, both > between interfaces and other interfaces, and between interfaces and > classes. > > > Non-Method Attributes in Interfaces > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The ``Interface`` implementation actually treats all attributes and > methods (i.e. descriptors) in the same way: their ``__get__`` (and > ``__set__`` and ``__delete__``, if present) methods are called with > the wrapped (adapted) object as "self". For functions, this has the > effect of creating a bound method linking the generic function to the > wrapped object. > > For non-function attributes, it may be easiest to specify them using > the ``property`` built-in, and the corresponding ``fget``, ``fset``, > and ``fdel`` attributes:: > > class ILength(Interface): > @property > @abstract > def length(self): > """Read-only length attribute""" > > # ILength(aList).length == list.__len__(aList) > when(ILength.length.fget, (list,))(list.__len__) > > Alternatively, methods such as ``_get_foo()`` and ``_set_foo()`` > may be defined as part of the interface, and the property defined > in terms of those methods, but this a bit more difficult for users > to implement correctly when creating a class that directly implements > the interface, as they would then need to match all the individual > method names, not just the name of the property or attribute. > > > Aspects > ------- > > The adaptation system provided assumes that adapters are "stateless", > which is to say that adapters have no attributes or storage apart from > those of the adapted object. This follows the "typeclass/instance" > model of Haskell, and the concept of "pure" (i.e., transitively > composable) adapters. > > However, there are occasionally cases where, to provide a complete > implementation of some interface, some sort of additional state is > required. > > One possibility of course, would be to attach monkeypatched "private" > attributes to the adaptee. But this is subject to name collisions, > and complicates the process of initialization. It also doesn't work > on objects that don't have a ``__dict__`` attribute. > > So the ``Aspect`` class is provided to make it easy to attach extra > information to objects that either: > > 1. have a ``__dict__`` attribute (so aspect instances can be stored > in it, keyed by aspect class), > > 2. support weak referencing (so aspect instances can be managed using > a global but thread-safe weak-reference dictionary), or > > 3. implement or can be adapt to the ``overloading.IAspectOwner`` > interface (technically, #1 or #2 imply this) > > Subclassing ``Aspect`` creates an adapter class whose state is tied > to the life of the adapted object. > > For example, suppose you would like to count all the times a certain > method is called on instances of ``Target`` (a classic AOP example). > You might do something like:: > > from overloading import Aspect > > class Count(Aspect): > count = 0 > > @after(Target.some_method) > def count_after_call(self, *args, **kw): > Count(self).count += 1 > > The above code will keep track of the number of times that > ``Target.some_method()`` is successfully called (i.e., it will not > count errors). Other code can then access the count using > ``Count(someTarget).count``. > > ``Aspect`` instances can of course have ``__init__`` methods, to > initialize any data structures. They can use either ``__slots__`` > or dictionary-based attributes for storage. > > While this facility is rather primitive compared to a full-featured > AOP tool like AspectJ, persons who wish to build pointcut libraries > or other AspectJ-like features can certainly use ``Aspect`` objects > and method-combination decorators as a base for more expressive AOP > tools. > > XXX spec out full aspect API, including keys, N-to-1 aspects, manual > attach/detach/delete of aspect instances, and the ``IAspectOwner`` > interface. > > > Extension API > ============= > > TODO: explain how all of these work > > implies(o1, o2) > > declare_implementation(iface, class) > > predicate_signatures(ob) > > parse_rule(ruleset, body, predicate, actiontype, localdict, globaldict) > > combine_actions(a1, a2) > > rules_for(f) > > Rule objects > > ActionDef objects > > RuleSet objects > > Method objects > > MethodList objects > > IAspectOwner > > > > Implementation Notes > ==================== > > Most of the functionality described in this PEP is already implemented > in the in-development version of the PEAK-Rules framework. In > particular, the basic overloading and method combination framework > (minus the ``@overload`` decorator) already exists there. The > implementation of all of these features in ``peak.rules.core`` is 656 > lines of Python at this writing. > > ``peak.rules.core`` currently relies on the DecoratorTools and > BytecodeAssembler modules, but both of these dependencies can be > replaced, as DecoratorTools is used mainly for Python 2.3 > compatibility and to implement structure types (which can be done > with named tuples in later versions of Python). The use of > BytecodeAssembler can be replaced using an "exec" or "compile" > workaround, given a reasonable effort. (It would be easier to do this > if the ``func_closure`` attribute of function objects was writable.) > > The ``Interface`` class has been previously prototyped, but is not > included in PEAK-Rules at the present time. > > The "implicit class rule" has previously been implemented in the > RuleDispatch library. However, it relies on the ``__metaclass__`` > hook that is currently eliminated in PEP 3115. > > I don't currently know how to make ``@overload`` play nicely with > ``classmethod`` and ``staticmethod`` in class bodies. It's not really > clear if it needs to, however. > > > Copyright > ========= > > This document has been placed in the public domain. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/monpublic%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070508/fcf166af/attachment.html From collinw at gmail.com Wed May 9 05:35:42 2007 From: collinw at gmail.com (Collin Winter) Date: Tue, 8 May 2007 20:35:42 -0700 Subject: [Python-3000] Build is broken (r55196) Message-ID: <43aa6ff70705082035kd28a3a6t3f547d5aa27f025e@mail.gmail.com> As of r55196 (and possibly earlier), the p3yk branch does not make when configured with --with-pydebug. setup.py triggers this assertion failure: python: Objects/object.c:64: _Py_AddToAllObjects: Assertion `(op->_ob_prev == ((void *)0)) == (op->_ob_next == ((void *)0))' failed. Any ideas? From python at rcn.com Wed May 9 05:00:26 2007 From: python at rcn.com (Raymond Hettinger) Date: Tue, 8 May 2007 20:00:26 -0700 Subject: [Python-3000] Octal literals anecdote Message-ID: <005401c791e7$0fd10d20$f301a8c0@RaymondLaptop1> Those following the octal literal discussion might enjoy reading one of today's SF bug reports: www.python.org/sf/1715302 Raymond From eucci.group at gmail.com Wed May 9 05:57:54 2007 From: eucci.group at gmail.com (Jeff Shell) Date: Tue, 8 May 2007 21:57:54 -0600 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <20070509015553.9C6843A4061@sparrow.telecommunity.com> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> Message-ID: <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> On 5/8/07, Phillip J. Eby wrote: > At 03:52 PM 5/8/2007 -0600, Jeff Shell wrote: > >I have a lengthy post that dissects a major issue that I have with > >ABCs and the Interface definition that I saw in PEP 3124:: it all > >seems rigidly class and class-instance based. > > Hi Jeff; I read your post a few days ago, but your blog doesn't > support comments, so I've been "getting around to" writing a > counterpoint on my blog. But, now I can do it here instead. :) Regarding the comments, blame spammers. And CAPTCHA. There's an equal spot in hell (should I choose to believe in hell) for both. :) I miss the chance to have conversation, but the weight of gardening or having to try seven times to differentiate between two funnily-drawn characters killed that part of my humanity. > >The cardinal sin I saw > >in the Interface definition in PEP 3124 (at least, at the time I last > >viewed it) was the inclusion of 'self' in a method spec. > > That's because you're confusing a generic function and a "method > spec". PEP 3124 interfaces are not *specifications*; they're > namespaces for generic functions. They are much closer in nature to > ABCs than they are zope.interface-style Interfaces. The principal > thing they have in common with zope.interface (aside from the name) > is the support for "IFoo(ob)"-style adaptation. > > Very few of zope.interface's design goals are shared by PEP > 3124. Notably, they are not particularly good for type checking or > verification. In PEP 3124, for example, IFoo(ob) *always* returns an > object with the specified attributes; the only way to know whether > they are actually implemented is to try using them. > > Notice that this is diametrically opposed to what zope.interface > wants to do in such situations -- which is why the PEP makes such a > big deal about it being possible to use zope.interface *instead*. Well that puts another fear in my heart about confusing the issue further - "oh, these kindof sound and look the same but are diametrically opposed?" I must admit that I didn't read PEP 3124 in depth - most of it was fascinating, some of it went way over my head in complexity, and then suddenly I saw an Interface. It seemed quite out of place, actually, and it seemed diametrically opposed to the simplicity and power I've been enjoying. > That is, my own observation is that different frameworks sometimes > need different kinds of interfaces. For example, someone might > create a framework that wants to verify preconditions and > postconditions of methods in an interface, rather than merely > specifying their names and arguments! Using zope.interface as an > exclusive basis for interface definition and type annotations would > block innovation in this area. FWIW, zope.interface allows 'tagged values' on all Interface Elements, which is the base class/type of Attribute, Method, and even Interface (I believe). Tagged values are used to hold invariants, which are code objects in the specification and can provide the 'if obj.age < 18, then obj.has_parents_permission must be true' type of logic. The interface (ha!) for setting and getting tagged attributes aint the prettiest, but it's the equivalent of type annotations and all other such things. And like 'invariant', it's not too difficult to write helper functions that deal with that interface. I use this in a SQLAlchemy based system that uses zope.schema (which builds on zope.interface to describe field (attributes) types and restrictions). The spec looks something like this:: class ILogins(Interface): login = zope.schema.TextLine(...) validateUnique(login, column=table.c.login) I have even used `zope.interface` to stamp out a new abstract class (of sorts), which it supports:: schema = field.schema if IInterface.providedBy(schema): # schema is an Interface, not an implementation; we need a concrete # instance. schema = schema.deferred() directlyProvides(schema, field.schema) This stamps out a new abstract instance, and declares support for the interface. This particular use was for view binding (something that you mention), but it's three lines of code that are very useful in my system. This particular use case is for Zope 'Object' fields; like an address attribute might expect to have a complex type, like 'IAddress'. There are situations, such as dynamic UI generation, where an empty instance is needed. This allows the abstract specification to provide just enough of a concrete implementation to fill in for a real object that's expected to arrive in the future. One could envision having that 'deferred()' interface method filling in stronger implementations. Which means that there's probably more power, or hooks (at least) in zope.interface than may be realized. The common uses of it don't cover all possibilities. > >... 'self' is an internal detail of class-instance implementations. > > Again - this is because you're assuming the purpose of a PEP 3124 > interface is to *specify* an interface, when in fact it's much more > like an ABC, which may also *implement* the interface. The > specification and implementation are intentionally unified here. Hm. I'll have to process this one... > >It seems to me that Abstract Base Classes and even PEP 3124 are > >primarily focused on classes. But in Python, "everything is an > >object", but not everything is class-based. > >... > >The rest of this post focuses on what `zope.interface` already > >provides - a system for specifying behavior and declaring support at > >both the class and object level - and 'object' really means 'object', > >which includes modules. > > Right -- and *neither* "specifying behavior" nor "declaring support" > are goals of PEP 3124; they're entirely out of its scope. The > Interface object's purpose is to support uniform access to, and > implementation of, individual operations. These are somewhat > parallel concepts, but very different in thrust; zope.interface is > LBYL, while PEP 3124 is EAFP all the way. Funny that those things are apparently opposed. 'Look Before You Leap' brings to mind the concept of "don't dive into an empty pool" or "don't do a backwards flip onto the pointy rocks"; where as 'Easier to Ask Forgiveness Than Permission' brings to bind the concept of "sorry i dove head first into your empty pool and cracked my skull open Mr. Johnson. If I had asked I'm sure you would have said no! In any case, even though it's your pool and there was a fence and everything and you did not give me permission, my parents are going to sue" (OK, maybe that last bit is the result of a healthy american upbringing... but still!) > As a result, PEP 3124 chooses to punt on the issue of individual > objects. It's quite possible within the framework to allow > instance-level checks during dispatching, but it's not going to be in > the default engine (which is based on type-tuple dispatching; see > Guido's Py3K overloading prototype). Huh? I'll try to look at that. types, classes, instances... That does it, I'm switching to Io. (Honestly - I've recently seen the light about prototype based object oriented programming; in light of types of types and classes of classes and classes and instances and oh my, languages that believe in "there are only objects, and they are only instances" are sounding sweeter every day) > zope.interface (and zope.component, IIRC) pay a high price in > complexity for allowing interfaces to be per-instance rather than > type-defined. I implemented the same feature in PyProtocols, but > over the years I rarely found it useful. My understanding of its > usefulness in Zope is that it: > > 1. supports specification and testing of module-level Zope APIs I've had uses for it outside of modules. > 2. allows views and other wrapping operations to be selected on a dynamic basis That's an essential (and very powerful) feature in a large system. But there are uses outside of that. An area where parts of the zope 3 component architecture DO pay a high price in complexity is where it has to create dynamic types in order to satisfy some core requirement. I can't remember where this is, but I know that I HATE it - suddenly, Zope is playing with MY class hierarchy. Suddenly I'm in debug mode and have no idea what I'm looking at. I'd much rather have it augmenting my instance than mangling my classes, unless I choose to have it mangle my classes by subclassing from a mangler. Annotate my class, but don't replace it. > Since #1 falls outside of PEP 3124's goals (i.e., it's not about > specification or testing), that leaves use case #2. In my > experience, it has been more than sufficient to simply give these > object some *other* interface, such as an IViewTags interface with a > method to query these dynamic "tag" interfaces. In other words, my > experience and opinion supports the view that use case #2 is actually > a coincidental abuse of interfaces for convenience, rather than the > "one obvious way" to handle the use case. Ugh. Yeah, there are 'marker' interfaces, but.. ugh. dynamic "tag" interfaces. Yuck. My experience has been otherwise. But I'm sorry that I confused that section of PEP 3124 to be about specification and testing. I do, however, think that is a better use case. > To put it another way, if you define getView() as a generic function, > you can always define a dynamic implementation of it for any type > that you wish to have dynamic view selection capability. Then, only > those cases that require a complex solution have to pay for the complexity. > > So, that's my rationale for why PEP 3124 doesn't provide any > instance-based features out of the box; outside of API specs for > singleton objects, the need for them is mostly an illusion created by > Zope 3's dynamic view selection hack. > > > >My main focus is on determining what Abstract Base Classes and/or PEP > >3124's Interfaces do better than `zope.interface` (if anyone else is > >familiar with that package). > > I at least am quite familiar with it, having helped to define some of > its terminology and API, as well as being the original author of its > class-decorator emulation for Python versions 2.2 and up. :) I also > argued for its adoption of PEP 246, and wrote PyProtocols to unify > Twisted and Zope interfaces in a PEP 246-based adaptation framework. > > And what PEP 3124 does much better than zope.interface or even PyProtocols is: > > 1. Adaptation, especially incomplete adaptation. You can implement > only the methods that are actually needed for your use case. If the > interface includes generic implementations that are defined in terms > of other methods in the interface, you need not reimplement > them. (Note: I'm well aware that here my definition of "better" > would be considered "worse" by Jim Fulton, since zope.interface is > LBYL-oriented. However, for many users and use cases, EAFP *is* > better, even if it's not for Zope.) And for many users and use cases, LBYL is better. Especially for those of us who get pissed off and start smoking every time we end up on a spikey rock! > 2. Interface recombination. AFAIK, zope.interface doesn't support > subset interfaces like PyProtocols does. Neither zope.interface nor > PyProtocols support method renaming, where two interfaces have a > method with the same specification but different method names. Um, then it's a different specification. eat_pizza() and consume_pizza() are different. They may be the same to Cookie Monster, but they're not the same for Grover. > 3. Low mental overhead. PEP 3124 doesn't even *need* interfaces; > simple use cases can just use overloaded functions and be on about > their business. Use cases that would require an interface and half a > dozen adapter classes in zope.interface can be met by simply creating > an overloaded function and adding methods. And the resulting code > reads like code in other languages that support overloading or > generic functions, rather than reading like Java. The mental overhead in PEP 3124 was pretty high for me, but that may stem from bias resulting from diametrically opposed interpretations of the same word :). > >In my blog post, I also show a dynamically constructed object > >providing an interface's specified behavior. An instance of an empty > >class is made, and then methods and other supporting attributes are > >attached to this specific instance only. Real world examples of this > >include Zope 2, where a folder may have "Python Scripts" or other > >callable members that, in effect, make for a totally custom object. It > >can also provide this same behavior (in fact, I was able to take > >advantage of this on some old old old Zope 2 projects that started in > >the web environment and transitioned to regular Python > >modules/classes). > > And how often does this happen outside of Zope? As I said, I rarely > found it to be the case anywhere else. I replicated the ability in > PyProtocols because I was biased by my prior Zope experience, but > once I got outside of Zope it almost entirely ceased to be useful. We have a dynamic data transformation framework that exists outside of Zope (Zope is basically used for UI). Objects are being dynamically composed, wrapped, decomposed, rewrapped, filtered, and split - constantly. Objects, not types. It's all composed of rules. I'm itching to be able to add rules to apply zope.interface specifications to the generated objects; if only to then make it much easier to add other filtering rules later on. With all of the wrapping and generation going on, we had to add some basic 'is_a' methods to the base classes. And we do care whether an object is a wrapper (isinstance) as well as whether the wrapped object provides the DataSet interface. It's another complex framework, it's just an outside-of-Zope system as an example. I know there's been some talk of ``__isinstance__()`` and ``__issubclass__()`` overriding being allowed, and I guess that's to take care of the wrapped and wrapped and wrapped object situations? In any case, it seems that I have long occupied worlds wherein complex objects could be composed on the fly outside of the type system, and I'd hade to have one of those constructed objects miss out on passing a 'is-foo-like' test because they weren't raised by proper upper middle class type parents. > Meanwhile, as I said, PEP 3124 is not closed to extension. It's > specifically intended that zope.interface (and any other interface > packages that might arise in future) should be able to play as > first-class citizens in the proposed API. However, depending on the > specific features desired, those packages might have some additional > integration work to do. > > (Note, by the way, that zope.interface is explicitly mentioned three > times in the PEP, as an example of how other interface types should > be able to be used for overloading, as long as they register > appropriate methods with the provided framework.) Thanks for clearing things up. I'll try to make another pass at reading the PEP more closely. For me, at this moment, all of this class/type based stuff is rubbing me the wrong way, and that's a feeling that's very hard to get past. I'm not sure why. I'll try to suppress those feelings when I revisit 3119 and 3124. What's happening with Roles/Traits? That's still the system that I'd like to see. I'm hoping that hasn't gotten swallowed up by generic overloaded pre-post wrapped abstract methods. (as long as I never have to type 'def public final', I'm cool). I think roles/traits as a core concept (LBYL zope.interface style, if you prefer to think of it that way) is useful, if not important. And I still believe that zope.interface already provides a language/API from which to build. -- Jeff Shell From nnorwitz at gmail.com Wed May 9 06:18:02 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Tue, 8 May 2007 21:18:02 -0700 Subject: [Python-3000] Build is broken (r55196) In-Reply-To: <43aa6ff70705082035kd28a3a6t3f547d5aa27f025e@mail.gmail.com> References: <43aa6ff70705082035kd28a3a6t3f547d5aa27f025e@mail.gmail.com> Message-ID: I had this problem. make clean solved it. -- n On 5/8/07, Collin Winter wrote: > As of r55196 (and possibly earlier), the p3yk branch does not make > when configured with --with-pydebug. setup.py triggers this assertion > failure: > > python: Objects/object.c:64: _Py_AddToAllObjects: Assertion > `(op->_ob_prev == ((void *)0)) == (op->_ob_next == ((void *)0))' > failed. > > Any ideas? > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/nnorwitz%40gmail.com > From collinw at gmail.com Wed May 9 06:38:11 2007 From: collinw at gmail.com (Collin Winter) Date: Tue, 8 May 2007 21:38:11 -0700 Subject: [Python-3000] Build is broken (r55196) In-Reply-To: References: <43aa6ff70705082035kd28a3a6t3f547d5aa27f025e@mail.gmail.com> Message-ID: <43aa6ff70705082138i72a9a6fcvec2e33c405508155@mail.gmail.com> Works on a different laptop with a fresh checkout. False alarm, sorry. On 5/8/07, Neal Norwitz wrote: > I had this problem. make clean solved it. -- n > > On 5/8/07, Collin Winter wrote: > > As of r55196 (and possibly earlier), the p3yk branch does not make > > when configured with --with-pydebug. setup.py triggers this assertion > > failure: > > > > python: Objects/object.c:64: _Py_AddToAllObjects: Assertion > > `(op->_ob_prev == ((void *)0)) == (op->_ob_next == ((void *)0))' > > failed. > > > > Any ideas? > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/nnorwitz%40gmail.com > > > From foom at fuhm.net Wed May 9 09:26:06 2007 From: foom at fuhm.net (James Y Knight) Date: Wed, 9 May 2007 03:26:06 -0400 Subject: [Python-3000] the future of the GIL In-Reply-To: References: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> Message-ID: <9F3AFD78-7C73-4C9F-8CA6-3D10A1468939@fuhm.net> On May 7, 2007, at 1:58 PM, Guido van Rossum wrote: > As C doesn't have an atomic increment nor an atomic > decrement-and-test, the INCREF and DECREF macros sprinkled throughout > the code (many thousands of them) must be protected by some lock. I've been intently ignoring the rest of the thread (and will continue to do so), but, to respond to this one particular point... This just isn't true. Python can do an atomic increment in a fast platform specific way. It need not restrict itself to what's available in C. (after all, *threads* aren't available in C....) Two implementations of note: 1) gcc 4.1 has atomic operation builtins: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic- Builtins.html#Atomic-Builtins 2) There's a pretty damn portable library which provides these functions for what looks to me like pretty much all CPUs anyone would use, under Linux, Windows, HP/UX, Solaris, and OSX, and has a fallback to using pthreads mutexes: http://www.hpl.hp.com/research/linux/atomic_ops/index.php4 http://packages.debian.org/stable/libdevel/libatomic-ops-dev It's quite possible the overhead of GIL-less INCREF/DECREF is still too high even with atomic increment/decrement primitives, but AFAICT nobody has actually tried it. So saying GIL-less operation for sure has too high of an overhead unless the refcounting GC is replaced seems a bit premature. James From tomerfiliba at gmail.com Wed May 9 10:47:02 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Wed, 9 May 2007 10:47:02 +0200 Subject: [Python-3000] the future of the GIL Message-ID: <1d85506f0705090147w155c15d3ka61f0f23b435b3a9@mail.gmail.com> On 5/8/07, Thomas Heller wrote: > Wouldn't multiple interpreters (assuming the problems with them would be fixed) > in the same process give the same benefit? A separate GIL for each one? hmm, i find this idea quite interesting really: * builtin immutable objects such as None, small ints, non-heap types, and builtin functions, would become uncollectible by the GC. after all, we can't reclaim their memory anyway, so keeping the accounting info is just a waste of time. the PyObject_HEAD struct would grow an "ob_collectible" field, which would tell the GC to ignore these objects altogether. for efficiency reasons, Py_INCREF/DECREF would still change ob_refcount, only the GC will ignore it for uncollectible objects. * each thread would have a separate interpreter, and all APIs should grow an additional parameter that specifies the interpreter state to use. * for compatibility reasons, we can also have a dict-like object mapping between thread-ids to interpreter states. when you invoke an API, it would get the interpreter state from the currently executing thread id. maybe that could be defined as a macro over the real API function. * the builtin immutable objects would be shared between all instances of the interpreter. other then those, all other objects would be local to the interpreter that created them * extension modules would have to be changed to support per-interpreter initialization. * in order to communicate between interpreters, we would use some kind of IPC mechanism, to serialize access to objects. of course it would be much more efficient, as no context switches are required in the same process. this would make each thread basically as protected as a OS process, so no locks would be required. * in order to support the IPC, a new builtin type, Proxy, would be added to the language. it would be the only object that can hold a cross-reference to objects in different interpreters -- much like today's RPC libs -- only that wouldn't have to work over a socket. * if python would ever have a tracing GC, that would greatly simplify things. also, moving to an atomic incref/decref library could also help. of course i'm not talking about adding that to py3k. it's too immature even for a pre-pep. but continuing to develop that idea more could be the means to removing the GIL, and finally having really parallel python scripts. -tomer From rasky at develer.com Wed May 9 10:54:12 2007 From: rasky at develer.com (Giovanni Bajo) Date: Wed, 09 May 2007 10:54:12 +0200 Subject: [Python-3000] PEP 3125 -- a modest proposal In-Reply-To: <000101c79102$385e1340$a91a39c0$@org> References: <000101c79102$385e1340$a91a39c0$@org> Message-ID: On 08/05/2007 1.48, Andrew Koenig wrote: > It has occurred to me that as Python stands today, an indent always begins > with a colon. So in principle, we could define anything that looks like an > indent but doesn't begin with a colon as a continuation. I got a dejavu here :) http://mail.python.org/pipermail/python-3000/2007-April/007045.html and Guido's answer: http://mail.python.org/pipermail/python-3000/2007-April/007063.html -- Giovanni Bajo From rasky at develer.com Wed May 9 11:07:45 2007 From: rasky at develer.com (Giovanni Bajo) Date: Wed, 09 May 2007 11:07:45 +0200 Subject: [Python-3000] the future of the GIL In-Reply-To: <20070506222840.25B2.JCARLSON@uci.edu> References: <463E4645.5000503@acm.org> <20070506222840.25B2.JCARLSON@uci.edu> Message-ID: On 07/05/2007 7.36, Josiah Carlson wrote: > By going multi-process rather than multi-threaded, one generally removes > shared memory from the equasion. Note that this has the same effect as > using queues with threads, which is generally seen as the only way of > making threads "easy". If one *needs* shared memory, we can certainly > create an mmap-based shared memory subsystem with fine-grained object > locking, or emulate it via a server process as the processing package > has done. > > Seriously, give the processing package a try. It's much faster than one > would expect. I'm fully +1 with you on everything. And part of the fact that we have to advocate this is because Python has always had pretty good threading libraries, but not processing libraries; actually, Python does have problems at spawning processes: the whole popen/popen2/subprocess mess isn't even fully solved yet. One thing to be said, though, is that using multiple processes cause some headaches with frozen distributions (PyInstaller, py2exe, etc.), like those usually found on Windows, specifically because Windows does not have fork(). The processing module, for instance, doesn't take this problem into account at all, making it worthless for many of my real-world use cases. -- Giovanni Bajo From aahz at pythoncraft.com Wed May 9 14:31:35 2007 From: aahz at pythoncraft.com (Aahz) Date: Wed, 9 May 2007 05:31:35 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: <1d85506f0705090147w155c15d3ka61f0f23b435b3a9@mail.gmail.com> References: <1d85506f0705090147w155c15d3ka61f0f23b435b3a9@mail.gmail.com> Message-ID: <20070509123135.GB1711@panix.com> On Wed, May 09, 2007, tomer filiba wrote: > > of course i'm not talking about adding that to py3k. it's too immature > even for a pre-pep. but continuing to develop that idea more could > be the means to removing the GIL, and finally having really parallel > python scripts. ...which is why this discussion belongs on python-ideas. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet From ark-mlist at att.net Wed May 9 15:33:54 2007 From: ark-mlist at att.net (Andrew Koenig) Date: Wed, 9 May 2007 09:33:54 -0400 Subject: [Python-3000] PEP 3125 -- a modest proposal In-Reply-To: References: <000101c79102$385e1340$a91a39c0$@org> Message-ID: <002701c7923e$b0930040$11b900c0$@net> > I got a dejavu here :) > http://mail.python.org/pipermail/python-3000/2007-April/007045.html > > and Guido's answer: > http://mail.python.org/pipermail/python-3000/2007-April/007063.html Well yes, but if it's done at the lexical level, the INDENT and DEDENT tokens don't exist. From walter at livinglogic.de Wed May 9 17:04:21 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Wed, 09 May 2007 17:04:21 +0200 Subject: [Python-3000] binascii.b2a_qp() in the p3yk branch Message-ID: <4641E2F5.4000702@livinglogic.de> binascii.b2a_qp() in the p3yk branch is broken. What I get is: $ gdb ./python GNU gdb 6.3-debian Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-linux"...Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) run Starting program: /var/home/walter/checkouts/Python/p3yk/python [Thread debugging using libthread_db enabled] [New Thread -1209593088 (LWP 17690)] Python 3.0x (p3yk:55200, May 9 2007, 11:43:49) [GCC 3.3.5 (Debian 1:3.3.5-13)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import binascii >>> binascii.b2a_qp(b'') Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1209593088 (LWP 17690)] 0xb7ee9093 in strchr () from /lib/tls/libc.so.6 (gdb) bt #0 0xb7ee9093 in strchr () from /lib/tls/libc.so.6 #1 0xb7c4744c in binascii_b2a_qp (self=0x0, args=0x0, kwargs=0x0) at /var/home/walter/checkouts/Python/p3yk/Modules/binascii.c:1153 #2 0x0807788e in PyCFunction_Call (func=0xb7e26e8c, arg=0xb7e1328c, kw=0xa0a0a0a) at Objects/methodobject.c:77 #3 0x080adbb4 in call_function (pp_stack=0xbffff45c, oparg=0) at Python/ceval.c:3513 #4 0x080abe66 in PyEval_EvalFrameEx (f=0x8235aa4, throwflag=0) at Python/ceval.c:2191 #5 0x080ac9e4 in PyEval_EvalCodeEx (co=0xb7e0fbf0, globals=0x0, locals=0xa0a0a0a, args=0xb7e3102c, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:2812 #6 0x080aef5f in PyEval_EvalCode (co=0x0, globals=0x0, locals=0x0) at Python/ceval.c:491 #7 0x080d14ba in run_mod (mod=0x0, filename=0x0, globals=0x0, locals=0x0, flags=0x0, arena=0x0) at Python/pythonrun.c:1282 #8 0x080d0967 in PyRun_InteractiveOneFlags (fp=0x0, filename=0x8116596 "", flags=0xbffff65c) at Python/pythonrun.c:800 #9 0x080d0793 in PyRun_InteractiveLoopFlags (fp=0xb7f9cca0, filename=0x8116596 "", flags=0xbffff65c) at Python/pythonrun.c:724 #10 0x080d1d32 in PyRun_AnyFileExFlags (fp=0xb7f9cca0, filename=0x8116596 "", closeit=0, flags=0xbffff65c) at Python/pythonrun.c:693 #11 0x080569ab in Py_Main (argc=-1208365920, argv=0xbffff65c) at Modules/main.c:491 #12 0x080564bb in main (argc=0, argv=0x0) at Modules/python.c:23 From jcarlson at uci.edu Wed May 9 17:57:10 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 09 May 2007 08:57:10 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: <9F3AFD78-7C73-4C9F-8CA6-3D10A1468939@fuhm.net> References: <9F3AFD78-7C73-4C9F-8CA6-3D10A1468939@fuhm.net> Message-ID: <20070509084606.25E5.JCARLSON@uci.edu> James Y Knight wrote: > On May 7, 2007, at 1:58 PM, Guido van Rossum wrote: > > As C doesn't have an atomic increment nor an atomic > > decrement-and-test, the INCREF and DECREF macros sprinkled throughout > > the code (many thousands of them) must be protected by some lock. > > 2) There's a pretty damn portable library which provides these > functions for what looks to me like pretty much all CPUs anyone would > use, under Linux, Windows, HP/UX, Solaris, and OSX, and has a > fallback to using pthreads mutexes: > > http://www.hpl.hp.com/research/linux/atomic_ops/index.php4 > http://packages.debian.org/stable/libdevel/libatomic-ops-dev > > > It's quite possible the overhead of GIL-less INCREF/DECREF is still > too high even with atomic increment/decrement primitives, but AFAICT > nobody has actually tried it. So saying GIL-less operation for sure > has too high of an overhead unless the refcounting GC is replaced > seems a bit premature. Of course the trouble is that while this would be great for incref/decref operations, and the handling of certain immutable types, very many objects in Python are dynamic and require the GIL for all operations on them. Removing the need to hold the GIL for incref/decref operations and telling people "some objects don't need to hold the GIL when you monkey with them" is really just a great way to confuse the hell out of people. There is already confusion with borrowed references, which affects fewer types than would be affected if we were to say "some immutable c types can be accessed without the GIL". Could it offer speed up? Probably, but how much, and what kind of a PITA would it become to use and manipulate the immutable types? - Josiah From pje at telecommunity.com Wed May 9 18:34:02 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 09 May 2007 12:34:02 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: <20070509163219.A332C3A4061@sparrow.telecommunity.com> At 09:58 PM 5/8/2007 -0400, Chris Monson wrote: >If we are looking at doing Design By Contract using @before and >@after (preconditions and postconditions), shouldn't there be some >way of getting at the return value in functions decorated with @after? Actually, it isn't really design by contract; i.e., I wasn't using the word "postconditions" in the DBC sense. I was saying you could put code there to *ensure* (i.e. implement) additional postconditions, not *check* them. If you wanted to implement DBC, it might be simplest to subclass overloading.Around to create a Contract class and @contract decorator, with a higher method-combination precedence than Around methods. Indeed, that might make another nice example for the PEP. From pje at telecommunity.com Wed May 9 19:28:18 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 09 May 2007 13:28:18 -0400 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com > References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> Message-ID: <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> At 09:57 PM 5/8/2007 -0600, Jeff Shell wrote: >I must admit that I didn't read PEP 3124 in depth - most of it was >fascinating, some of it went way over my head in complexity, and then >suddenly I saw an Interface. It seemed quite out of place, actually, Yep; it's really more like a "adapting abstract base class". In a GF-based language like Dylan, it would probably just be called a "module". In Dylan, the generic functions you export from a module define an interface, in precisely the same way as PEP 3124 Interfaces work, except they have no IFoo(bar).baz(), only IFoo.baz(bar). (And with different syntax, of course.) >[snip lots of stuff about zope.interface's specification features] If you don't count in-house "enterprise" operations and shops like Google, Yahoo, et al., the development of Zope is certainly one of the largest (if not the very largest) Python project. It's understandable that LBYL is desirable in that environment. On the other hand, such large projects in Python are pretty darn rare. Meanwhile, the subject of typing systems for Python (into which category zope.interface most assuredly falls) is still an open research area. I've watched zope.interface and its predecessors and spin-offs for almost a decade now, and it's *still* not particularly settled. Look, for example, at Guido's blog posts about type expressions and type parameterization. Look at how quickly the efforts to define standard ABCs for Py3K turned back from grand vision to, "oh heck that doesn't quite do what we wanted". So, my thought is that LBYL type systems for Python are still "here there be dragons" territory. PEP 3124 simply doesn't try to go there, but neither does it block the passage of other explorers. It happily coexists with other type systems, and if you want to use something called "roles" or "traits" as argument annotations, it will be OK with that. The specifics, which I haven't spelled out in the "extension API" section yet, are mainly that in order to work with the default dispatching engine, you must register methods for the "implies()" generic function, such that the engine can tell whether a class implies the annotation, the annotation implies a class, or the annotation implies any other annotation. Of course, there will also be a generic function you can register with in order to use a different dispatch engine when your annotations are encountered. This would be the hook you'd need to use in order to have instance-specific checks. Bear in mind, of course, that the such checks will necessarily be slow, and the slowdown may apply to every invocation of the function. >'Look Before You Leap' >brings to mind the concept of "don't dive into an empty pool" or >"don't do a backwards flip onto the pointy rocks"; where as 'Easier to >Ask Forgiveness Than Permission' brings to bind the concept of "sorry >i dove head first into your empty pool and cracked my skull open Mr. >Johnson. If I had asked I'm sure you would have said no! In any case, >even though it's your pool and there was a fence and everything and >you did not give me permission, my parents are going to sue" (OK, >maybe that last bit is the result of a healthy american upbringing... >but still!) Well, if you run a program whose effects are that important without having tested it first, perhaps you *should* be sued. :) >Huh? I'll try to look at that. types, classes, instances... That does >it, I'm switching to Io. (Honestly - I've recently seen the light >about prototype based object oriented programming; in light of types >of types and classes of classes and classes and instances and oh my, >languages that believe in "there are only objects, and they are only >instances" are sounding sweeter every day) Why stop at Io? Cecil is a prototype-based language with generic functions, including full predicate dispatch and even "dynamic types" (i.e., an object can be of several types at the same time, depending on its current state). :) For Python, though, I really don't see the need to create such ultra-dynamic objects. It's so easy to just dynamically create a class whenever you need one, that it doesn't seem worth the bother to munge instances. Zope, of course, is always a special case in this respect, since *persisting* dynamic classes is a PITA, compared to dynamic instances. But the stdlib ain't Zope, and most Python code doesn't need to have its classes stored in a database. >>Since #1 falls outside of PEP 3124's goals (i.e., it's not about >>specification or testing), that leaves use case #2. In my >>experience, it has been more than sufficient to simply give these >>object some *other* interface, such as an IViewTags interface with a >>method to query these dynamic "tag" interfaces. In other words, my >>experience and opinion supports the view that use case #2 is actually >>a coincidental abuse of interfaces for convenience, rather than the >>"one obvious way" to handle the use case. > >Ugh. Yeah, there are 'marker' interfaces, but.. ugh. dynamic "tag" >interfaces. Yuck. > >My experience has been otherwise. Now you're confusing me. How is having a way to ask for markers different from marker interfaces? Both are equally "yuck", except that one isn't abusing interfaces to use them as markers. >But I'm sorry that I confused that section of PEP 3124 to be about >specification and testing. I do, however, think that is a better use >case. Well, write another PEP, then. :) >We have a dynamic data transformation framework that exists outside of >Zope (Zope is basically used for UI). Objects are being dynamically >composed, wrapped, decomposed, rewrapped, filtered, and split - >constantly. Objects, not types. It's all composed of rules. I'm >itching to be able to add rules to apply zope.interface specifications >to the generated objects; if only to then make it much easier to add >other filtering rules later on. Maybe I'm getting to be like Guido in my old age, but maybe you should just write the program first and extract the framework second. :) In truth, though, I'd almost bet some serious cash that proper use of generic functions would evaporate your framework to virtual nothingness. My observation has been that in languages with generic functions, the sort of thing that requires complex frameworks with lots of interfaces in Zope looks like a trivial little library. In PEAK, I was able to cut out about 75% of the code in one sub-framework by switching it from interfaces+adaptation to generic functions, and in the process made it much more comprehensible. With that kind of productivity enhancement, one can afford to write a lot more unit tests to do the LBYL-ing, and still come out ahead. :) Really, the problem of LBYL interfaces and adaptation is that they require you to laboriously figure out in advance where to allow flexibility in the framework. Worse, they emphasize the *solution* domain rather than the problem domain. They define what's required of parts that have to be plugged into a machine that then "solves" the problem. So you have to design that machine and where various parts fit into it, which is largely a distraction from whatever you were trying to do in the first place! In contrast, with generic functions, you focus simply on identifying the operations required by the problem domain, performing a functional decomposition rather than trying to create an entire network mesh of roles and responsibilities. You code the *problem*, not the solution, so your code might even be comprehensible (or at least explainable) to non-programmer domain experts. (i.e., even if they can't read the code, you can read it to them and confirm whether it matches the requirements.) *Then*, after you have the functional decomposition (which is your real "specification" anyway), you can then decide what concrete object types you might need, and implement the lowest-level domain operations for those types. And if you're following that process, interfaces really aren't anything but the documentation that explains what those problem-domain operations are supposed to do, so that if for some reason you need to implement new concrete types at a later time, you can add appropriate implementations. IMO, that's a lot closer to being the One Obvious Way, because it doesn't *need* LBYL or anything like zope.interface, anywhere in that process. See, if you want contract enforcement, you can just *build it right in* to the generic functions, using an appropriate method type. See my comments here: http://mail.python.org/pipermail/python-3000/2007-May/007444.html Of course, that's not a *static* guarantee of correctness, but neither is most of what you're talking about. Tests are still required, either way. From guido at python.org Wed May 9 19:56:19 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 9 May 2007 10:56:19 -0700 Subject: [Python-3000] binascii.b2a_qp() in the p3yk branch In-Reply-To: <4641E2F5.4000702@livinglogic.de> References: <4641E2F5.4000702@livinglogic.de> Message-ID: Fixed. The code was using strchr() instead of memchr(), which was wrong anyway; but b"" is the only object (apparently) whose buffer pointer is NULL when the size is 0. Committed revision 55204. Please backport (I wouldn't be surprised if this could be exploited). On 5/9/07, Walter D?rwald wrote: > binascii.b2a_qp() in the p3yk branch is broken. What I get is: > > $ gdb ./python > GNU gdb 6.3-debian > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i386-linux"...Using host libthread_db > library "/lib/tls/libthread_db.so.1". > > (gdb) run > Starting program: /var/home/walter/checkouts/Python/p3yk/python > [Thread debugging using libthread_db enabled] > [New Thread -1209593088 (LWP 17690)] > Python 3.0x (p3yk:55200, May 9 2007, 11:43:49) > [GCC 3.3.5 (Debian 1:3.3.5-13)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import binascii > >>> binascii.b2a_qp(b'') > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread -1209593088 (LWP 17690)] > 0xb7ee9093 in strchr () from /lib/tls/libc.so.6 > (gdb) bt > #0 0xb7ee9093 in strchr () from /lib/tls/libc.so.6 > #1 0xb7c4744c in binascii_b2a_qp (self=0x0, args=0x0, kwargs=0x0) at > /var/home/walter/checkouts/Python/p3yk/Modules/binascii.c:1153 > #2 0x0807788e in PyCFunction_Call (func=0xb7e26e8c, arg=0xb7e1328c, > kw=0xa0a0a0a) at Objects/methodobject.c:77 > #3 0x080adbb4 in call_function (pp_stack=0xbffff45c, oparg=0) at > Python/ceval.c:3513 > #4 0x080abe66 in PyEval_EvalFrameEx (f=0x8235aa4, throwflag=0) at > Python/ceval.c:2191 > #5 0x080ac9e4 in PyEval_EvalCodeEx (co=0xb7e0fbf0, globals=0x0, > locals=0xa0a0a0a, args=0xb7e3102c, argcount=0, kws=0x0, kwcount=0, > defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:2812 > #6 0x080aef5f in PyEval_EvalCode (co=0x0, globals=0x0, locals=0x0) at > Python/ceval.c:491 > #7 0x080d14ba in run_mod (mod=0x0, filename=0x0, globals=0x0, > locals=0x0, flags=0x0, arena=0x0) at Python/pythonrun.c:1282 > #8 0x080d0967 in PyRun_InteractiveOneFlags (fp=0x0, filename=0x8116596 > "", flags=0xbffff65c) at Python/pythonrun.c:800 > #9 0x080d0793 in PyRun_InteractiveLoopFlags (fp=0xb7f9cca0, > filename=0x8116596 "", flags=0xbffff65c) at Python/pythonrun.c:724 > #10 0x080d1d32 in PyRun_AnyFileExFlags (fp=0xb7f9cca0, > filename=0x8116596 "", closeit=0, flags=0xbffff65c) at > Python/pythonrun.c:693 > #11 0x080569ab in Py_Main (argc=-1208365920, argv=0xbffff65c) at > Modules/main.c:491 > #12 0x080564bb in main (argc=0, argv=0x0) at Modules/python.c:23 > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From bjourne at gmail.com Wed May 9 21:54:46 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Wed, 9 May 2007 21:54:46 +0200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> On 5/1/07, Phillip J. Eby wrote: > Comments and questions appreciated, as it'll help drive better explanations > of both the design and rationales. I'm usually not that good at guessing > what other people will want to know (or are likely to misunderstand) until > I get actual questions. I haven't read it all yet. But my first comment is "This PEP is HUGE!" 922 lines. Is there any way you could shorten it or split it up in more manageable chunks? My second comment is that there are to few examples in the PEP. > The API will be implemented in pure Python with no C, but may have > some dependency on CPython-specific features such as ``sys._getframe`` > and the ``func_code`` attribute of functions. It is expected that > e.g. Jython and IronPython will have other ways of implementing > similar functionality (perhaps using Java or C#). > > > Rationale and Goals > =================== > > Python has always provided a variety of built-in and standard-library > generic functions, such as ``len()``, ``iter()``, ``pprint.pprint()``, > and most of the functions in the ``operator`` module. However, it > currently: > > 1. does not have a simple or straightforward way for developers to > create new generic functions, I think there is a very straightforward way. For example, a generic function for token handling could be written like this: def handle_any(val): pass def handle_tok(tok, val): handlers = { ANY : handle_any, BRANCH : handle_branch, CATEGORY : handle_category } try: return handlers[tok](val) except KeyError, e: fmt = "Unsupported token type: %s" raise ValueError(fmt % tok) This is an idiom I have used hundreds of times. The handle_tok function is generic because it dispatches to the correct handler based on the type of tok. > 2. does not have a standard way for methods to be added to existing > generic functions (i.e., some are added using registration > functions, others require defining ``__special__`` methods, > possibly by monkeypatching), and When does "external" code wants to add to a generic function? In the above example, you add to the generic function by inserting a new key-value pair in the handlers list. If needed, it wouldn't be very hard to make the handle_tok function extensible. Just make the handlers object global. > 3. does not allow dispatching on multiple argument types (except in > a limited form for arithmetic operators, where "right-hand" > (``__r*__``) methods can be used to do two-argument dispatch. Why would you want that? > The ``@overload`` decorator allows you to define alternate > implementations of a function, specialized by argument type(s). A > function with the same name must already exist in the local namespace. > The existing function is modified in-place by the decorator to add > the new implementation, and the modified function is returned by the > decorator. Thus, the following code:: > > from overloading import overload > from collections import Iterable > > def flatten(ob): > """Flatten an object to its component iterables""" > yield ob > > @overload > def flatten(ob: Iterable): > for o in ob: > for ob in flatten(o): > yield ob > > @overload > def flatten(ob: basestring): > yield ob > > creates a single ``flatten()`` function whose implementation roughly > equates to:: > > def flatten(ob): > if isinstance(ob, basestring) or not isinstance(ob, Iterable): > yield ob > else: > for o in ob: > for ob in flatten(o): > yield ob > > **except** that the ``flatten()`` function defined by overloading > remains open to extension by adding more overloads, while the > hardcoded version cannot be extended. I very much prefer the latter version. The reason is because the "locality of reference" is much worse in the overloaded version and because I have found it to be very hard to read and understand overloaded code in practice. Let's say you find some code that looks like this: def do_stuff(ob): yield obj @overload def do_stuff(ob : ClassA): for o in ob: for ob in do_stuff(o): yield ob @overload def do_stuff(ob : classb): yield ob Or this: def do_stuff(ob): if isinstance(ob, classb) or not isinstance(ob, ClassA): yield ob else: for o in ob: for ob in do_stuff(o): yield ob With the overloaded code, you have to read EVERY definition of "do_stuff" to understand what the code does. Not just every definition in the same module, but every definition in the whole program because someone might have extended the do_stuff generic function. What if they have defined a do_stuff that dispatch on ClassC that is a subclass of ClassA? Good luck in figuring out what the code does. With the non-overloaded version you also have the ability to insert debug print statements to figure out what happens. > For example, if someone wants to use ``flatten()`` with a string-like > type that doesn't subclass ``basestring``, they would be out of luck > with the second implementation. With the overloaded implementation, > however, they can either write this:: > > @overload > def flatten(ob: MyString): > yield ob > or this (to avoid copying the implementation):: > > from overloading import RuleSet > RuleSet(flatten).copy_rules((basestring,), (MyString,)) That may be great for flexibility, but I contend that it is awful for reality. In reality, it would be much simpler and more readable to just rewrite the flatten method: def flatten(ob): flat = (isinstance(ob, (basestring, MyString)) or not isinstance(ob, Iterable)) if flat: yield ob else: for o in ob: for ob in flatten(o): yield ob Or change MyString so that it derives from basestring. > Most of the functionality described in this PEP is already implemented > in the in-development version of the PEAK-Rules framework. In > particular, the basic overloading and method combination framework > (minus the ``@overload`` decorator) already exists there. The > implementation of all of these features in ``peak.rules.core`` is 656 > lines of Python at this writing. I think PEAK is a great framework and that generic functions are great for those who likes it. But I'm not convinced that writing multiple dispatch functions the way PEAK prescribes is better than the any of the currently used idioms. I first encountered them when I tried fix a bug in the jsonify.py module in TurboGears (now relocated to the TurboJSON package). It took me about 30 minutes to figure out how it worked (including manual reading). Had not PEAK style generic functions been used, it would have taken me 2 minutes top. So IMHO, generic functions certainly are useful for some things, but not useful enough. Using them as a replacement for ordinary multiple dispatch techniques is a bad idea. -- mvh Bj?rn From steven.bethard at gmail.com Wed May 9 22:16:26 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 9 May 2007 14:16:26 -0600 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> Message-ID: On 5/9/07, BJ?rn Lindqvist wrote: > I very much prefer the latter version. The reason is because the > "locality of reference" is much worse in the overloaded version and > because I have found it to be very hard to read and understand > overloaded code in practice. > > Let's say you find some code that looks like this: > > def do_stuff(ob): > yield obj > > @overload > def do_stuff(ob : ClassA): > for o in ob: > for ob in do_stuff(o): > yield ob > > @overload > def do_stuff(ob : classb): > yield ob > > Or this: > > def do_stuff(ob): > if isinstance(ob, classb) or not isinstance(ob, ClassA): > yield ob > else: > for o in ob: > for ob in do_stuff(o): > yield ob > > With the overloaded code, you have to read EVERY definition of > "do_stuff" to understand what the code does. Not just every definition > in the same module, but every definition in the whole program because > someone might have extended the do_stuff generic function. I don't buy this argument. That's like saying that I can't understand what len() does without examining every object that defines __len__(). Do you really have trouble understanding functions like len() or hash()? STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From steven.bethard at gmail.com Wed May 9 22:41:14 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 9 May 2007 14:41:14 -0600 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: On 4/30/07, Phillip J. Eby wrote: > PEP: 3124 > Title: Overloading, Generic Functions, Interfaces, and Adaptation [snip] > from overloading import overload > from collections import Iterable > > def flatten(ob): > """Flatten an object to its component iterables""" > yield ob > > @overload > def flatten(ob: Iterable): > for o in ob: > for ob in flatten(o): > yield ob > > @overload > def flatten(ob: basestring): > yield ob [snip] > ``@overload`` vs. ``@when`` > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The ``@overload`` decorator is a common-case shorthand for the more > general ``@when`` decorator. It allows you to leave out the name of > the function you are overloading, at the expense of requiring the > target function to be in the local namespace. It also doesn't support > adding additional criteria besides the ones specified via argument > annotations. The following function definitions have identical > effects, except for name binding side-effects (which will be described > below):: > > @overload > def flatten(ob: basestring): > yield ob > > @when(flatten) > def flatten(ob: basestring): > yield ob > > @when(flatten) > def flatten_basestring(ob: basestring): > yield ob > > @when(flatten, (basestring,)) > def flatten_basestring(ob): > yield ob [snip] +1 on @overload and @when. > Proceeding to the "Next" Method > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [snip] > "Before" and "After" Methods > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [snip] > "Around" Methods > ~~~~~~~~~~~~~~~~ [snip] > Custom Combinations > ~~~~~~~~~~~~~~~~~~~ I'd rather see all this left as a third-party library to start with. (Yes, even including __proceed__.) It shouldn't be a problem to supply these things separately, right? > Interfaces and Adaptation > ------------------------- > > The ``overloading`` module provides a simple implementation of > interfaces and adaptation. The following example defines an > ``IStack`` interface, and declares that ``list`` objects support it:: > > from overloading import abstract, Interface > > class IStack(Interface): > @abstract > def push(self, ob) > """Push 'ob' onto the stack""" > > @abstract > def pop(self): > """Pop a value and return it""" > > > when(IStack.push, (list, object))(list.append) > when(IStack.pop, (list,))(list.pop) > > mylist = [] > mystack = IStack(mylist) > mystack.push(42) > assert mystack.pop()==42 > > The ``Interface`` class is a kind of "universal adapter". It accepts > a single argument: an object to adapt. It then binds all its methods > to the target object, in place of itself. Thus, calling > ``mystack.push(42``) is the same as calling > ``IStack.push(mylist, 42)``. +1 on adapters like this. > Interfaces as Type Specifiers > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > ``Interface`` subclasses can be used as argument annotations to > indicate what type of objects are acceptable to an overload, e.g.:: > > @overload > def traverse(g: IGraph, s: IStack): > g = IGraph(g) > s = IStack(s) > # etc.... and +1 on being able to specify Interfaces as "types". > Aspects > ------- [snip] > from overloading import Aspect > > class Count(Aspect): > count = 0 > > @after(Target.some_method) > def count_after_call(self, *args, **kw): > Count(self).count += 1 Again, I'd rather see this kind of thing in a third-party library. Summary of my PEP thoughts: * Keep things simple: just @overload, @when, @abstract and Interface. * More complex things like __proceed__, @before, @after, Aspects, etc. should be added by third-party modules As others have mentioned, the current PEP is overwhelming. I'd rather see Py3K start with just the basics. When people are comfortable with the core, we can look into introducing the extras. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From pje at telecommunity.com Wed May 9 22:58:39 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 09 May 2007 16:58:39 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.co m> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> Message-ID: <20070509205655.622A63A4061@sparrow.telecommunity.com> At 09:54 PM 5/9/2007 +0200, BJ?rn Lindqvist wrote: >On 5/1/07, Phillip J. Eby wrote: > > Comments and questions appreciated, as it'll help drive better explanations > > of both the design and rationales. I'm usually not that good at guessing > > what other people will want to know (or are likely to misunderstand) until > > I get actual questions. > >I haven't read it all yet. But my first comment is "This PEP is HUGE!" >922 lines. Is there any way you could shorten it or split it up in >more manageable chunks? My second comment is that there are to few >examples in the PEP. So it's too big AND too small. I guess it's then equally displeasing to everyone. :) I notice that most of the rest of your message calls for further additions. :) > > 1. does not have a simple or straightforward way for developers to > > create new generic functions, > >I think there is a very straightforward way. For example, a generic >function for token handling could be written like this: > > def handle_any(val): > pass > > def handle_tok(tok, val): > handlers = { > ANY : handle_any, > BRANCH : handle_branch, > CATEGORY : handle_category > } > try: > return handlers[tok](val) > except KeyError, e: > fmt = "Unsupported token type: %s" > raise ValueError(fmt % tok) > >This is an idiom I have used hundreds of times. The handle_tok >function is generic because it dispatches to the correct handler based >on the type of tok. First, this example is broken, since there's no way for anybody to add handlers to it (entirely aside from the fact that it recreates the dispatch table every time it executes). Second, even if you *could* add handlers to it, you'd need to separately document the mechanism for adding handlers, for each and every new generic function. The purpose of the API in 3124 is to have a standard API that's independent of *how* the dispatching is actually implemented. That is, whether you look up types in a dictionary or implement full predicate dispatch makes no difference to the API. > > 2. does not have a standard way for methods to be added to existing > > generic functions (i.e., some are added using registration > > functions, others require defining ``__special__`` methods, > > possibly by monkeypatching), and > >When does "external" code wants to add to a generic function? Any time you want to use new code with an existing framework. For example, objects to be documented with pydoc currently have to reverse engineer a bunch of inspection code, while in a GF-based design they'd just add methods. For more examples, see this thread: http://mail.python.org/pipermail/python-3000/2007-May/007217.html >What if they have defined a do_stuff that dispatch on ClassC that is a >subclass of ClassA? Good luck in figuring out what the code does. > >With the non-overloaded version you also have the ability to insert >debug print statements to figure out what happens. Ahem. @before(do_stuff) def debug_it(ob: ClassC): import pdb pdb.set_trace() Note that you don't even need to know what module the behavior you're looking for is even *in*; you only need to know where to import do_stuff and ClassC from, and put the above in a module that's been imported when do_stuff is called. In other words, generic functions massively increase your ability to trace specific execution paths. >That may be great for flexibility, but I contend that it is awful for >reality. In reality, it would be much simpler and more readable to >just rewrite the flatten method: Not if it's *not your flatten function*, it wouldn't be. >Or change MyString so that it derives from basestring. Not if it's *not your MyString* class. >I first encountered them when I tried fix a bug in the jsonify.py >module in TurboGears (now relocated to the TurboJSON package). It took >me about 30 minutes to figure out how it worked (including manual >reading). Had not PEAK style generic functions been used, it would >have taken me 2 minutes top. So, you're saying it only took 28 minutes to acquire a skill that you can now use elsewhere? That sounds great, actually. :) >So IMHO, generic functions certainly are useful for some things, but >not useful enough. Using them as a replacement for ordinary multiple >dispatch techniques is a bad idea. What do you mean by "ordinary multiple dispatch techniques"? No offense intended, but from the overall context of your message, it sounds like perhaps you don't know what "multiple dispatch" means, since you earlier asked "why would you want that?" in reference to an example of it. From ncoghlan at gmail.com Wed May 9 23:09:22 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 May 2007 07:09:22 +1000 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070509205655.622A63A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> Message-ID: <46423882.6070006@gmail.com> Phillip J. Eby wrote: > At 09:54 PM 5/9/2007 +0200, BJ?rn Lindqvist wrote: >> With the non-overloaded version you also have the ability to insert >> debug print statements to figure out what happens. > > Ahem. > > @before(do_stuff) > def debug_it(ob: ClassC): > import pdb > pdb.set_trace() > > Note that you don't even need to know what module the behavior you're > looking for is even *in*; you only need to know where to import > do_stuff and ClassC from, and put the above in a module that's been > imported when do_stuff is called. > > In other words, generic functions massively increase your ability to > trace specific execution paths. Possibly another good example to include in the PEP... Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From pje at telecommunity.com Wed May 9 23:38:28 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 09 May 2007 17:38:28 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: <20070509213643.A37643A4061@sparrow.telecommunity.com> At 02:41 PM 5/9/2007 -0600, Steven Bethard wrote: >On 4/30/07, Phillip J. Eby wrote: >>Proceeding to the "Next" Method >>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >[snip] >>"Before" and "After" Methods >>~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >[snip] >>"Around" Methods >>~~~~~~~~~~~~~~~~ >[snip] >>Custom Combinations >>~~~~~~~~~~~~~~~~~~~ > >I'd rather see all this left as a third-party library to start with. >(Yes, even including __proceed__.) That'd be rather like adding new-style classes but not super(). >It shouldn't be a problem to supply these things separately, right? Separating proceed-ability out would be tough; every function that wanted to use it in any way would need additional decoration to flag that it wanted to use an alternative base method implementation. Meanwhile, for the rest of the features, most of the implementation would still have to be in the core module. The method combination framework has to exist in the core, or it can't do method combination without essentially replacing what's *in* the core, at which point you're not really *using* it any more. That is, you'd just be using the third-party library. In other words, no, you can't take out all forms of method combination (which is essentially what you're proposing) and still have the ability to add it back in later. Meanwhile, leaving in the ability to have method combination later, but removing the actual implementation of the @before/around/after decorators in place would delete a total of less than 40 non-blank lines of code. Removing __proceed__ support would delete maybe 10 lines more, tops. Given that removing the 40 lines removes an excellent example of how to use the combination framework, and removing the 10 imposes considerable difficulty for anybody else to put them back, it seems unwise to me to take either of them out. That is, I don't see what gain there is by removing them, that wouldn't be equally well addressed by splitting documentation. (These lines-of-code estimates are based on what's in peak.rules.core, of course, and so might change a bit depending on how things go with the PEP.) >>Aspects >>------- >[snip] >> from overloading import Aspect >> >> class Count(Aspect): >> count = 0 >> >> @after(Target.some_method) >> def count_after_call(self, *args, **kw): >> Count(self).count += 1 > >Again, I'd rather see this kind of thing in a third-party library. The reason for it being in the PEP is that it benefits from having a single shared implementation (especially for the weakref dictionary, but also for common-maintenance reasons). Also, the core's implementation of generic functions will almost certainly be using Aspects itself, so it might as well expose that implementation for others to use... >Summary of my PEP thoughts: >* Keep things simple: just @overload, @when, @abstract and Interface. >* More complex things like __proceed__, @before, @after, Aspects, etc. >should be added by third-party modules > >As others have mentioned, the current PEP is overwhelming. I'd rather >see Py3K start with just the basics. When people are comfortable with >the core, we can look into introducing the extras. Naturally, I don't consider any of these items "extras", or I wouldn't have included them. The "extras" to me are things like full predicate dispatch with pattern matching and variable binding, ordered classifiers, parsing combinators (i.e. using overloads to define grammar productions), custom implication precedence, custom predicate indexes, and all that sort of thing. What's proposed in the PEP is a far cry from being even as expressive as CLOS or AspectJ are, but it does supply the bare minimum needed to create a foundation for other libraries to build such capabilities on it. (Btw, a side data point: Ruby 2.0 is supposed to include method combination; specifically ":pre", ":post", and ":wrap" qualifiers modeled on CLOS's "before", "after", and "around", respectively.) From pje at telecommunity.com Wed May 9 23:44:02 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 09 May 2007 17:44:02 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <46423882.6070006@gmail.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <46423882.6070006@gmail.com> Message-ID: <20070509214217.D03463A4061@sparrow.telecommunity.com> At 07:09 AM 5/10/2007 +1000, Nick Coghlan wrote: >Phillip J. Eby wrote: > > At 09:54 PM 5/9/2007 +0200, BJ?rn Lindqvist wrote: > >> With the non-overloaded version you also have the ability to insert > >> debug print statements to figure out what happens. > > > > Ahem. > > > > @before(do_stuff) > > def debug_it(ob: ClassC): > > import pdb > > pdb.set_trace() > > > > Note that you don't even need to know what module the behavior you're > > looking for is even *in*; you only need to know where to import > > do_stuff and ClassC from, and put the above in a module that's been > > imported when do_stuff is called. > > > > In other words, generic functions massively increase your ability to > > trace specific execution paths. > >Possibly another good example to include in the PEP... Probably. When I write PEP's I tend to assume my primary audience is Guido, and I know he's already seen tons of tracing/logging/debugging/contract checking examples of what you can do with AOP. Or at least, I know he's previously mentioned being unimpressed by such. In any case, I didn't want to use that sort of example for fear that some might write the entire thing off as being more "unconvincing examples of AOP". Still, it is kind of handy that you can write all your contract checking and debug/trace/log code in separate modules from your main code, and simply import those modules to activate those features. It's just not the only reason or even the most important reason to have generic functions. From benji at benjiyork.com Thu May 10 00:11:36 2007 From: benji at benjiyork.com (Benji York) Date: Wed, 09 May 2007 18:11:36 -0400 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> Message-ID: <46424718.20006@benjiyork.com> Phillip J. Eby wrote: > If you don't count in-house "enterprise" operations and shops like > Google, Yahoo, et al., the development of Zope is certainly one of > the largest (if not the very largest) Python project. It's > understandable that LBYL is desirable in that environment. On the > other hand, such large projects in Python are pretty darn rare. By way of clarification: Even in the large Zope 3 projects I work on (which obviously use zope.interface), we virtually never use interfaces for LBYL (just as Zope 3 itself rarely does). Instead, we either assume something implements a (little "i") interface and act as such (never invoking the interface machinery, the way most people write Python), or we use adaptation to ask for something that implements a particular (big "I") Interface (but even there no verification is done). My point is, people generally use zope.interface Interfaces as documentation and names for particular behavior/API, not as an LBYL enforcement mechanism. -- Benji York http://benjiyork.com From jimjjewett at gmail.com Thu May 10 00:26:48 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 9 May 2007 18:26:48 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070509205655.622A63A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> Message-ID: On 5/9/07, Phillip J. Eby wrote: > At 09:54 PM 5/9/2007 +0200, BJ?rn Lindqvist wrote: > >What if they have defined a do_stuff that dispatch on ClassC that is a > >subclass of ClassA? Good luck in figuring out what the code does. > >With the non-overloaded version you also have the ability to insert > >debug print statements to figure out what happens. > @before(do_stuff) > def debug_it(ob: ClassC): > import pdb > pdb.set_trace() I think this may be backwards from his question. As I read it, you know about class A, but have never heard about class C (which happens to be a substitute for A). Someone added a different do_stuff implementation for class C. @before(do_stuff) def debug_it(obj: ClassA): # Never called, it is a classC def debug_it(obj: not ClassA) # can't do this? def debug_it(obj): # OK, trace *everything*. # Or, at least, everything that nicely did a call_next_method, # in case you wanted to wrap it this way. Objects that thought # they were providing a complete concrete implementation will # still sneak through def wrap_the_generic(generic_name, debug_it): orig = generic_name def replacement( ...) # hope you get the .sig right debug_it(...) orig(...) generic_name = replacement # hope you can monkeypatch # uhh ... was the original supposed to have additional behavior, # for more registrations, etc... Unless I'm missing something, this only simplifies things when all specific implementations not only drink the kool-ade, but avoid kool-ade related bugs. -jJ From jimjjewett at gmail.com Thu May 10 00:46:28 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 9 May 2007 18:46:28 -0400 Subject: [Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py )) In-Reply-To: References: <20070505150035.GA16303@panix.com> <200705051334.45120.fdrake@acm.org> <20070505124008.648D.JCARLSON@uci.edu> Message-ID: On 5/8/07, Guido van Rossum wrote: > On 5/8/07, Jim Jewett wrote: > > On 5/8/07, Guido van Rossum wrote: > > > 1. develop working code under 2.6 > > > 2. make sure it is warning-free with the special -Wpy3k option > > > 3. use 2to3 to convert it to 3.0 compatible syntax in a temporary directory > > > 4. run your unit test suite with 3.0 > > > 5. for any defects you find, EDIT THE 2.6 SOURCE AND GO BACK TO STEP 2 > > The problem is what to do after step 5 ... > > Do you leave your 3 code in the awkward auto-generated format, and > > suggest (by example) that py3 code is clunky? > > Do you immediately stop supporting 2.x? > > Or do you fork the code? > I disagree that the converted code is awkward. On python-dev, there was a recent discussion about changing stdlib xrange into range. It was pointed out that this would make conversion harder, because range will get converted to list(xrange). Maybe the tool has gotten smart enough to avoid constructions like: for k, v in list(dict.items()): for i in list(range(10)): but I can't help feeling there will always be a few cases where it makes the code longer and worse. The hand patch for removing tuple parameters from the stdlib was certainly better than the tool could ever be expected to generate. I would be satisfied iIf the tool generates something like # 2to3: Did you really *need* a list? for k, v in list(dict.items()): and I can then change it back to for k, v in dict.items(): knowing it will run OK in both versions. I will not be happy if I have to do this editing more than once. At the moment, I haven't seen anything that can't be expressed in code that would run in either version. (There might be something, particular in str/unicode or tracebacks; I just don't remember seeing it, so I think I could choose to avoid it for most code.) (Note that letting the common code be less efficient in 2.x is an acceptable tradeoff for me. Others might prefer a way to annotate a function or class as already dual.) -jJ From greg.ewing at canterbury.ac.nz Thu May 10 03:08:03 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 10 May 2007 13:08:03 +1200 Subject: [Python-3000] the future of the GIL In-Reply-To: <9F3AFD78-7C73-4C9F-8CA6-3D10A1468939@fuhm.net> References: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> <9F3AFD78-7C73-4C9F-8CA6-3D10A1468939@fuhm.net> Message-ID: <46427073.701@canterbury.ac.nz> James Y Knight wrote: > This just isn't true. Python can do an atomic increment in a fast > platform specific way. The problem with this, from what I've heard, is that atomic increment instructions tend to be on the order of 100 times slower than normal memory accesses (I guess because they have to bypass the cache or do extra work to keep it consistent). If that's true, even a single-instruction atomic increment could be much slower than the currently used instruction sequence for a Py_INCREF or Py_DECREF. > It's quite possible the overhead of GIL-less INCREF/DECREF is still > too high even with atomic increment/decrement primitives, but AFAICT > nobody has actually tried it. I thought that's what the oft-cited previous attempt was doing, but maybe not. If not, it could be worth trying to see what happens. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From pje at telecommunity.com Thu May 10 03:23:41 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 09 May 2007 21:23:41 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> Message-ID: <20070510012202.EB0A83A4061@sparrow.telecommunity.com> At 06:26 PM 5/9/2007 -0400, Jim Jewett wrote: >On 5/9/07, Phillip J. Eby wrote: >>At 09:54 PM 5/9/2007 +0200, BJ?rn Lindqvist wrote: >> >What if they have defined a do_stuff that dispatch on ClassC that is a >> >subclass of ClassA? Good luck in figuring out what the code does. > >> >With the non-overloaded version you also have the ability to insert >> >debug print statements to figure out what happens. > >> @before(do_stuff) >> def debug_it(ob: ClassC): >> import pdb >> pdb.set_trace() > >I think this may be backwards from his question. As I read it, you >know about class A, but have never heard about class C (which happens >to be a substitute for A). Someone added a different do_stuff >implementation for class C. > > @before(do_stuff) > def debug_it(obj: ClassA): # Never called, it is a classC Actually, if you read what was said above, ClassC is a subclass of ClassA, so the above *is* called. > def debug_it(obj: not ClassA) # can't do this? Actually, you can, if you create something like a NotClass type and register methods to define its implication relationships to classes and other criteria. Of course, it then wouldn't be called for ClassC... > def debug_it(obj): # OK, trace *everything*. > # Or, at least, everything that nicely did a call_next_method, > # in case you wanted to wrap it this way. Objects that thought > # they were providing a complete concrete implementation will > # still sneak through Which is an excellent demonstration, by the way, of another reason why before/after methods are useful. They're all *always* called before and after the primary methods, regardless of how many of them were registered. > def wrap_the_generic(generic_name, debug_it): > orig = generic_name > def replacement( ...) # hope you get the .sig right > debug_it(...) > orig(...) > generic_name = replacement # hope you can monkeypatch > # uhh ... was the original supposed to have additional behavior, > # for more registrations, etc... I don't understand what this last example is supposed to be, but note that if you want to create special @debug methods with higher precedence than Around methods, it's relatively simple: class Debug(Around): """Like an Around, but with higher precedence""" debug = Debug.make_decorator('debug') always_overrides(Debug, Around) always_overrides(Debug, Method) always_overrides(Debug, Before) always_overrides(Debug, After) (It occurs to me that although the current prototype implementation requires you to explicitly declare method override relationships for all applicable types, I should probably make the transitive declaration(s) automatic, so that the above would require only 'always_overrides(Debug, Around)' to work.) >Unless I'm missing something, this only simplifies things when all >specific implementations not only drink the kool-ade, but avoid >kool-ade related bugs. I don't understand what you mean here. From greg.ewing at canterbury.ac.nz Thu May 10 03:24:44 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 10 May 2007 13:24:44 +1200 Subject: [Python-3000] the future of the GIL In-Reply-To: References: <463E4645.5000503@acm.org> <20070506222840.25B2.JCARLSON@uci.edu> Message-ID: <4642745C.1040702@canterbury.ac.nz> Giovanni Bajo wrote: > using multiple processes cause some > headaches with frozen distributions (PyInstaller, py2exe, etc.), like those > usually found on Windows, specifically because Windows does not have fork(). Isn't that just a problem with Windows generally? I don't see what the method of packaging has to do with it. Also, I've seen it suggested that there may actually be a way of doing something equivalent to a fork in Windows, even though it doesn't have a fork() system call as such. Does anyone know more about this? -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From pje at telecommunity.com Thu May 10 03:30:22 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed, 09 May 2007 21:30:22 -0400 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <46424718.20006@benjiyork.com> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> <46424718.20006@benjiyork.com> Message-ID: <20070510012836.16D0D3A4061@sparrow.telecommunity.com> At 06:11 PM 5/9/2007 -0400, Benji York wrote: >Phillip J. Eby wrote: >>If you don't count in-house "enterprise" operations and shops like >>Google, Yahoo, et al., the development of Zope is certainly one of >>the largest (if not the very largest) Python project. It's >>understandable that LBYL is desirable in that environment. On the >>other hand, such large projects in Python are pretty darn rare. > >By way of clarification: Even in the large Zope 3 projects I work on >(which obviously use zope.interface), we virtually never use >interfaces for LBYL (just as Zope 3 itself rarely does). Yet, this is precisely what Jeff is claiming zope.interface is useful/desirable *for*, and Jim Fulton has also been quite clear that LBYL is its very raison d'etre. But of course, as you point out below, that's not necessarily what most zope.interface users actually *do* with it. :) >Instead, we either assume something implements a (little "i") >interface and act as such (never invoking the interface machinery, >the way most people write Python), or we use adaptation to ask for >something that implements a particular (big "I") Interface (but even >there no verification is done). > >My point is, people generally use zope.interface Interfaces as >documentation and names for particular behavior/API, not as an LBYL >enforcement mechanism. And thus, for all of the use cases you just described, the minimal PEP 3124 Interface implementation should do just fine, yes? Indeed, ABCs would work for those use cases too, if you didn't need adaptation. Or am I missing something? From talin at acm.org Thu May 10 04:21:34 2007 From: talin at acm.org (Talin) Date: Wed, 09 May 2007 19:21:34 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: <4642745C.1040702@canterbury.ac.nz> References: <463E4645.5000503@acm.org> <20070506222840.25B2.JCARLSON@uci.edu> <4642745C.1040702@canterbury.ac.nz> Message-ID: <464281AE.7040903@acm.org> Greg Ewing wrote: > Giovanni Bajo wrote: >> using multiple processes cause some >> headaches with frozen distributions (PyInstaller, py2exe, etc.), like those >> usually found on Windows, specifically because Windows does not have fork(). > > Isn't that just a problem with Windows generally? I don't > see what the method of packaging has to do with it. > > Also, I've seen it suggested that there may actually be > a way of doing something equivalent to a fork in Windows, > even though it doesn't have a fork() system call as such. > Does anyone know more about this? I also wonder about embedded systems and game consoles. I don't know how many embedded microprocessors support fork(), but I know that modern consoles such as PS/3 and Xbox do not, since they have no support for virtual memory at all. Also remember that the PS/3 is supposed to be one of the poster children for multiprocessing -- the whole 'cell processor' thing. You can't write an efficient game on the PS/3 unless it uses multiple processors. Admittedly, not many current console-based games use Python, but that need not always be the case in the future, and a number of PC-based games are using it already. This much I agree: There's no point in talking about supporting multiple processors using threads as long as we're living in a refcounting world. Thought experiment: Suppose you were writing and brand-new dynamic language today, designed to work efficiently on multi-processor systems. Forget all of Python's legacy implementation details such as GILs and refcounts and such. What would it look like, and how well would it perform? (And I don't mean purely functional languages a la Erlang.) For example, in a language that is based on continuations at a very deep level, there need not be any "global interpreter" at all. Each separate flow of execution is merely a pointer to a call frame, the evaluation of which produces a pointer to another call frame (or perhaps the same one). Yes, there would still be some shared state that would have to be managed, but I wouldn't think that the performance penalty of managing that would be horrible. -- Talin From greg.ewing at canterbury.ac.nz Thu May 10 04:40:34 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 10 May 2007 14:40:34 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070509205655.622A63A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> Message-ID: <46428622.7000204@canterbury.ac.nz> Phillip J. Eby wrote: > For > example, objects to be documented with pydoc currently have to > reverse engineer a bunch of inspection code, while in a GF-based > design they'd just add methods. There's a problem with this that I haven't seen a good answer to yet. To add a method to a generic function, you have to import the module that defines the base function. So any module that wants its objects documented in a custom way ends up depending on pydoc. This problem doesn't arise if a protocol-based approach is used, e.g. having pydoc look for a __document__ method or some such. There's also the possibility that other documentation systems could make use of the same protocol if it's designed appropriately, whereas extending pydoc-defined generic functions benefits pydoc and nothing else. -- Greg From steven.bethard at gmail.com Thu May 10 04:43:07 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 9 May 2007 20:43:07 -0600 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070509213643.A37643A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070509213643.A37643A4061@sparrow.telecommunity.com> Message-ID: On 5/9/07, Phillip J. Eby wrote: > At 02:41 PM 5/9/2007 -0600, Steven Bethard wrote: > >On 4/30/07, Phillip J. Eby wrote: > >>Proceeding to the "Next" Method > >>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >[snip] > >>"Before" and "After" Methods > >>~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >[snip] > >>"Around" Methods > >>~~~~~~~~~~~~~~~~ > >[snip] > >>Custom Combinations > >>~~~~~~~~~~~~~~~~~~~ > > > >I'd rather see all this left as a third-party library to start with. > >(Yes, even including __proceed__.) > > That'd be rather like adding new-style classes but not super(). Ok, then leave __proceed__ in. I'm not really particular about the details -- I'm just hoping you can cut things down to the absolute minimum you need, and provide the rest in a third party module. As it is, I think there's too much in the PEP for it to be comprehensible. And @before, @after, etc. seemed like good candidates for being supplied later. > Meanwhile, for the rest of the features, most of the implementation > would still have to be in the core module. That's fine. I'm not worried about the implementation. I trust you can handle that. ;-) I'm worried about trying to pack too much stuff into a PEP. > Meanwhile, leaving in the ability to have method combination later, > but removing the actual implementation of the @before/around/after > decorators in place would delete a total of less than 40 non-blank > lines of code. Sure, but it would also delete huge chunks of explanation about something which really isn't the core of the PEP. Python got decorators without the 6 lines of functools.update_wrapper -- I see this as being roughly the same. In particular, functools.update_wrapper was never mentioned in PEP 318. > >As others have mentioned, the current PEP is overwhelming. I'd rather > >see Py3K start with just the basics. When people are comfortable with > >the core, we can look into introducing the extras. > > Naturally, I don't consider any of these items "extras", or I > wouldn't have included them. I understand that. I'm just hoping you can find a way to cut the PEP down enough so that folks have a chance of wrapping their head around it. ;-) I really do think something along these lines (overloading/generic functions) is right for Python. I just think the current PEP is too overwhelming for people to see that. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From greg.ewing at canterbury.ac.nz Thu May 10 04:56:08 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 10 May 2007 14:56:08 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070510012202.EB0A83A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> Message-ID: <464289C8.4080004@canterbury.ac.nz> Phillip J. Eby wrote: > Which is an excellent demonstration, by the way, of another reason > why before/after methods are useful. They're all *always* called > before and after the primary methods, regardless of how many of them > were registered. But unless I'm mistaken, ClassC can still take over the whole show using a method that doesn't call the next method. > debug = Debug.make_decorator('debug') > always_overrides(Debug, Around) > always_overrides(Debug, Method) > always_overrides(Debug, Before) > always_overrides(Debug, After) This is getting seriously brain-twisting. Are you saying that this somehow overrides the subclass relationships, so that an @Debug method for ClassA always gets called before other methods, even ones for ClassC? If so, I think this is all getting way too deeply magical. Also, you still can't completely win, as someone could define an @UtterlySelfish decorator that takes precedence over your @Debug decorator. For that matter, what if there is simply another decorator @Foo that is defined to always_override @Around? The precedence between that and your @Debug decorator then appears to be undefined. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Thu May 10 05:18:05 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 10 May 2007 15:18:05 +1200 Subject: [Python-3000] the future of the GIL In-Reply-To: <3d2ce8cb0705091839w7b4fec56ud6a1ed9cb0ad264d@mail.gmail.com> References: <463E4645.5000503@acm.org> <20070506222840.25B2.JCARLSON@uci.edu> <4642745C.1040702@canterbury.ac.nz> <3d2ce8cb0705091839w7b4fec56ud6a1ed9cb0ad264d@mail.gmail.com> Message-ID: <46428EED.3060205@canterbury.ac.nz> Mike Klaas wrote: > NtCreateProcess with SectionHandler=NULL does a fork()-like > copy-on-write thing. But it is an internal kernel api. I just did some googling on this, and it seems to be described as "undocumented". Does this mean that it's possible to call it from userland, just that it's not guaranteed to exist in the future? If so, it looks like it might be possible to give Python a fork() that works on Windows, at least for the time being. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Thu May 10 05:27:34 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 10 May 2007 15:27:34 +1200 Subject: [Python-3000] the future of the GIL In-Reply-To: <464281AE.7040903@acm.org> References: <463E4645.5000503@acm.org> <20070506222840.25B2.JCARLSON@uci.edu> <4642745C.1040702@canterbury.ac.nz> <464281AE.7040903@acm.org> Message-ID: <46429126.5070801@canterbury.ac.nz> Talin wrote: > Thought experiment: Suppose you were writing and brand-new dynamic > language today, designed to work efficiently on multi-processor systems. > Forget all of Python's legacy implementation details such as GILs and > refcounts and such. What would it look like, and how well would it > perform? (And I don't mean purely functional languages a la Erlang.) Although I wouldn't make it purely functional, I think I'd take some ideas from things like Erlang and Occam. In particular, I'd keep the processes/threads/whatever as separated as possible, communicating only via well- defined channels having copy semantics for mutable objects. Anything directly shared between processes (code objects, classes, etc.) would be read-only, and probably exempt from refcounting to enable access without locking. Hmmm... guess I'll have to go away and design PyLang now. :-) -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From mike.klaas at gmail.com Thu May 10 05:16:05 2007 From: mike.klaas at gmail.com (Mike Klaas) Date: Wed, 9 May 2007 20:16:05 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: <464281AE.7040903@acm.org> References: <463E4645.5000503@acm.org> <20070506222840.25B2.JCARLSON@uci.edu> <4642745C.1040702@canterbury.ac.nz> <464281AE.7040903@acm.org> Message-ID: <3d2ce8cb0705092016n4ca01f45ld99ed8156aa7bf0f@mail.gmail.com> On 5/9/07, Talin wrote: <> > This much I agree: There's no point in talking about supporting multiple > processors using threads as long as we're living in a refcounting world. <> But python isn't--CPython, though, certainly is. The CPython interpreter has enormous stability, backward-compatibility, and speed expectations to live up to, which makes huge architectural unheavals an unlikely proposition. I build multi-machine distributed systems using python (and hence use multi-process parallelism all the time), but I would still like to have a GILless CPython. I don't buy the "multi-processor machines aren't common" argument (certainly has not been my experience), nor "threading is inferior to multiple processes as the former is too hard": neither of these arguments would carry the day if (for instance) a new python interpreter was created from scratch today. Instead, the real reason the GIL still lingers in CPython is that such an architectural change (while maintaining the same performance) is difficult and _not done_. No-one has solved this challenge, and until that happens, talking on mailing lists about how great it would be is pretty much pointless. It would probably be more fruitful to start a new python interpreter project based on a different architecture. Perhaps you could even write it in python. I suggest that you call it "PyPy". -MIke From greg.ewing at canterbury.ac.nz Thu May 10 05:31:42 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 10 May 2007 15:31:42 +1200 Subject: [Python-3000] the future of the GIL In-Reply-To: <3d2ce8cb0705092016n4ca01f45ld99ed8156aa7bf0f@mail.gmail.com> References: <463E4645.5000503@acm.org> <20070506222840.25B2.JCARLSON@uci.edu> <4642745C.1040702@canterbury.ac.nz> <464281AE.7040903@acm.org> <3d2ce8cb0705092016n4ca01f45ld99ed8156aa7bf0f@mail.gmail.com> Message-ID: <4642921E.9070707@canterbury.ac.nz> Mike Klaas wrote: > It would probably be more fruitful to start a > new python interpreter project based on a different architecture. But it's not even clear what that different architecture should be... -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From jcarlson at uci.edu Thu May 10 05:38:49 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 09 May 2007 20:38:49 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: <4642745C.1040702@canterbury.ac.nz> References: <4642745C.1040702@canterbury.ac.nz> Message-ID: <20070509203702.25EF.JCARLSON@uci.edu> Greg Ewing wrote: > Giovanni Bajo wrote: > > using multiple processes cause some > > headaches with frozen distributions (PyInstaller, py2exe, etc.), like those > > usually found on Windows, specifically because Windows does not have fork(). > > Isn't that just a problem with Windows generally? I don't > see what the method of packaging has to do with it. > > Also, I've seen it suggested that there may actually be > a way of doing something equivalent to a fork in Windows, > even though it doesn't have a fork() system call as such. > Does anyone know more about this? Cygwin emulates fork() by creating a shared mmap, creating a new child process, copying the contents of the parent process' memory to the child process (after performing the proper allocations), then hacks up the child process' call stack. - Josiah From rrr at ronadam.com Thu May 10 08:22:54 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 10 May 2007 01:22:54 -0500 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <46428622.7000204@canterbury.ac.nz> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <46428622.7000204@canterbury.ac.nz> Message-ID: <4642BA3E.8080708@ronadam.com> Greg Ewing wrote: > Phillip J. Eby wrote: >> For >> example, objects to be documented with pydoc currently have to >> reverse engineer a bunch of inspection code, while in a GF-based >> design they'd just add methods. > > There's a problem with this that I haven't seen a good > answer to yet. To add a method to a generic function, > you have to import the module that defines the base > function. So any module that wants its objects documented > in a custom way ends up depending on pydoc. If you have everything at the same level then that may be true, but I don't think that is what Phillip is suggesting. There might be a group of generic functions for introspection that all return some consistent data format back. This might be in the inspect module. Then you might have another set of generic functions for combining different sources of information together into another data structure. This would be used to pre-format and order the information. These might be in docutils. Then you might have a third level of generic functions for outputting that data in different formats. Ie.. text, html, xml... reST. This might be part of a generic formatting package. Then pydoc becomes a very light weight module that ties these together to do what it does, but it can still extend each framework where it needs to. A lot of the apparent fear involved with ABC's and generic functions seems to be disregarding the notion that generally you know something about the data (and code) that is being used at particular points in a process. It is that implicit quality that allows us to not need to LBYL or put try-excepts around everything we do. I think ABC's and generic functions may allow us to extend that quality to more of our code. *My feelings about ABC's and inheritance is they are very useful for easily creating new objects. *I think generic functions are very useful for doing operations on those objects when it doesn't make since for those objects to do yet another type of operation on it self. Or to put another way, not everything done to an object should be done by a method in that object. Cheers, Ron From paul.dubois at gmail.com Thu May 10 08:23:01 2007 From: paul.dubois at gmail.com (Paul Du Bois) Date: Wed, 9 May 2007 23:23:01 -0700 Subject: [Python-3000] the future of the GIL In-Reply-To: <464281AE.7040903@acm.org> References: <463E4645.5000503@acm.org> <20070506222840.25B2.JCARLSON@uci.edu> <4642745C.1040702@canterbury.ac.nz> <464281AE.7040903@acm.org> Message-ID: <85f6a31f0705092323h48c0130ayd48e1b6e03adb3a4@mail.gmail.com> I'll just chime in tersely since this really seems like -ideas and not -3000 territory On 5/9/07, Talin wrote: > modern consoles such as PS/3 and Xbox do not, since they have no support for > virtual memory at all. Well, they have virtual memory as in virtual address spaces, but they don't swap. Lack of fork() is more of a control thing. > Also remember that the PS/3 is supposed to be one of the poster children > for multiprocessing -- the whole 'cell processor' thing. You can't write > an efficient game on the PS/3 unless it uses multiple processors. The PS3 is a good argument _against_ having multiple threads in one interp. With the cell architecture you want to stay very far away from a shared memory threading model. The 8 "SPUs" in the cell run separate processes in their own address space (256K, code _and_ data!), so the cell works best with a "multiple processes with tightly managed intercommunication channels" program architecture. > Thought experiment: Suppose you were writing and brand-new dynamic > language today, designed to work efficiently on multi-processor systems. Let's see... it would make state sharing difficult and asynchronous communication easy! paul From theller at ctypes.org Thu May 10 08:49:38 2007 From: theller at ctypes.org (Thomas Heller) Date: Thu, 10 May 2007 08:49:38 +0200 Subject: [Python-3000] the future of the GIL In-Reply-To: <46427073.701@canterbury.ac.nz> References: <1d85506f0705050629k35ebdf6aj285e8f10489d21d5@mail.gmail.com> <9F3AFD78-7C73-4C9F-8CA6-3D10A1468939@fuhm.net> <46427073.701@canterbury.ac.nz> Message-ID: Greg Ewing schrieb: > James Y Knight wrote: > >> This just isn't true. Python can do an atomic increment in a fast >> platform specific way. > > The problem with this, from what I've heard, is that > atomic increment instructions tend to be on the order > of 100 times slower than normal memory accesses (I > guess because they have to bypass the cache or do extra > work to keep it consistent). > > If that's true, even a single-instruction atomic increment > could be much slower than the currently used instruction > sequence for a Py_INCREF or Py_DECREF. > >> It's quite possible the overhead of GIL-less INCREF/DECREF is still >> too high even with atomic increment/decrement primitives, but AFAICT >> nobody has actually tried it. > > I thought that's what the oft-cited previous attempt was > doing, but maybe not. If not, it could be worth trying > to see what happens. > I have recompiled Python from svn trunk on Windows, after replacing '(op)->ob_recount++' and '--(op)->ob_refcnt' with calls to InterlockedIncrement() and InterlockedDecrement(). The result was that the pystones/second went down from ~52000 to ~24500. Quite disappointing, I would say. Thomas From p.f.moore at gmail.com Thu May 10 10:26:38 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 10 May 2007 09:26:38 +0100 Subject: [Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py )) In-Reply-To: References: <20070505150035.GA16303@panix.com> <200705051334.45120.fdrake@acm.org> <20070505124008.648D.JCARLSON@uci.edu> Message-ID: <79990c6b0705100126o339e1371ue8f5349b5b9a8d8a@mail.gmail.com> On 09/05/07, Jim Jewett wrote: > Maybe the tool has gotten smart enough to avoid constructions like: > > for k, v in list(dict.items()): > > for i in list(range(10)): > > but I can't help feeling there will always be a few cases where it > makes the code longer and worse. Why don't you (in these cases) change your 2.x code to for k, v in dict.iteritems(): for i in xrange(10): Then 2to3 will do the right thing, *and* your 2.x code is improved... > knowing it will run OK in both versions. I will not be happy if I > have to do this editing more than once. If you edit the 2.6 source, you only need to do that once. Paul. From bjourne at gmail.com Thu May 10 12:40:13 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Thu, 10 May 2007 12:40:13 +0200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> Message-ID: <740c3aec0705100340w2f50ef4ex32a212a7949c8c7a@mail.gmail.com> On 5/10/07, Jim Jewett wrote: > On 5/9/07, Phillip J. Eby wrote: > > At 09:54 PM 5/9/2007 +0200, BJ?rn Lindqvist wrote: > > > >What if they have defined a do_stuff that dispatch on ClassC that is a > > >subclass of ClassA? Good luck in figuring out what the code does. > > > >With the non-overloaded version you also have the ability to insert > > >debug print statements to figure out what happens. > > > @before(do_stuff) > > def debug_it(ob: ClassC): > > import pdb > > pdb.set_trace() > > I think this may be backwards from his question. As I read it, you > know about class A, but have never heard about class C (which happens > to be a substitute for A). Someone added a different do_stuff > implementation for class C. It is backwards, using the debugger solves a problem that should not have been there in the first case. Let's assume the original flatten example again: from overloading import overload from collections import Iterable def flatten(ob): """Flatten an object to its component iterables""" yield ob @overload def flatten(ob: Iterable): for o in ob: for ob in flatten(o): yield ob @overload def flatten(ob: basestring): yield ob Let's also assume that: 1. The above code is stored in a file flatten.py. 2. There is a class MyString in file mystring.py which is an Iterable but which is not a basestring. 3. There is a third file foo.py which contains RuleSet(flatten).copy_rules((basestring,), (MyString,)) 4. There is a fourth file, bar.py, which contains ms = MyString('hello') print list(flatten(ms)) 4. These four files are part of a moderately large Python project containing 80 modules. According to how Phillip has described PEAK-style generic functions, these assumptions are not at all unreasonable. I am a new programmer analyzing the code on that project. I have read the files flatten.py, mystring.py and bar.py but not foo.py. The snippet in bar.py is then very surprising to me because it will print ['hello'] instead of ['h', 'e', 'l', 'l', 'o']. Using a simple dispatch technique like the one in my handle_tok example, or in the non-generic version of flatten, I wouldn't have this problem. Now I do, so how do I troubleshoot it? I could use the debug_it @before-function, but I don't think I should have to just to see the control flow of a darn flatten function. The other approach would be to grep the whole source for "flatten." Then I should be able to figure out which dispatch rules are active when the snippet in bar.py is invoked. But it would require considerable work. -- mvh Bj?rn From tomerfiliba at gmail.com Thu May 10 13:36:55 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Thu, 10 May 2007 13:36:55 +0200 Subject: [Python-3000] mixin class decorator Message-ID: <1d85506f0705100436j4ed5c2f7xe6bef98c3b86f5bf@mail.gmail.com> with the new class decorators of py3k, new use cases emerge. for example, now it is easy to have real mixin classes or even mixin modules, a la ruby. unlike inheritance, this mixin mechanism simply merges the namespace of the class or module into the namespace of the decorated class. it does not affect class hierarchies/MRO, and provides finer granularity as to what methods are merged, i.e., you explicit mark which methods should be merged. def mixinmethod(func): """marks a method as a mixin method""" func.is_mixin = True return func def get_unbound(obj, name): if name in obj.__dict__: return obj.__dict__[name] else: for b in obj.mro(): if name in b.__dict__: return b.__dict__[name] def mixin(obj, override = False): """a class decorator that merges the attributes of 'obj' into the class""" def wrapper(cls): for name in dir(obj): attr = get_unbound(obj, name) if getattr(attr, "is_mixin", False): if override or not hasattr(cls, name): setattr(cls, name, attr) return cls return wrapper Example ================== class DictMixin: @mixinmethod def __iter__(self): for k in self.keys(): yield k @mixinmethod def has_key(self, key): try: value = self[key] except KeyError: return False return True @mixinmethod def clear(self): for key in self.keys(): del self[key] ... @mixin(DictMixin) class MyDict: def keys(self): return range(10) md = MyDict() for k in md: print k ================================== does it seem useful? should it be included in some stdlib? or at least mentioned as a use case for class decorators in PEP 3129? (not intended for 3.0a1) -tomer From benji at benjiyork.com Thu May 10 15:06:24 2007 From: benji at benjiyork.com (Benji York) Date: Thu, 10 May 2007 09:06:24 -0400 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <20070510012836.16D0D3A4061@sparrow.telecommunity.com> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> <46424718.20006@benjiyork.com> <20070510012836.16D0D3A4061@sparrow.telecommunity.com> Message-ID: <464318D0.2000109@benjiyork.com> Phillip J. Eby wrote: > At 06:11 PM 5/9/2007 -0400, Benji York wrote: >> By way of clarification: Even in the large Zope 3 projects I work on >> (which obviously use zope.interface), we virtually never use >> interfaces for LBYL (just as Zope 3 itself rarely does). > > Yet, this is precisely what Jeff is claiming zope.interface is > useful/desirable *for*, I'll let him speak for himself. > and Jim Fulton has also been quite clear that > LBYL is its very raison d'etre. I would let Jim speak for himself too, but I prefer to put words in his mouth. ;) While zope.interface has anemic facilities for "verifying" interfaces, few people use them, and even then rarely outside of very simple "does this object look right" when testing. It may have been believed verification would be a great thing, but it's all but deprecated at this point. > And thus, for all of the use cases you just described, the minimal > PEP 3124 Interface implementation should do just fine, yes? Could be, especially if it allows for adaptation. I don't have the time to pour over the PEP right now. My main intent in piping up was dispelling the LBYL dispersions about zope.interface. ;) If the PEP cooperates as well with zope.interface as you suggest, all will be good in the world. Personally I'd prefer sufficient hooks be added to the language and these types of things (interfaces, adaptation, generic functions, etc.) be left to third parties (like yourself) instead of being canonicalized unnecessarily. > Indeed, > ABCs would work for those use cases too, if you didn't need > adaptation. Or am I missing something? The main advantage I see to zope.interface is adaptation. Other than that, the fact that the inheritance and interface hierarchies aren't mixed. I would turn the argument around and assert that interfaces can be used for the rare LBYL uses that ABCs appear to be aimed at, as well as more interesting things. -- Benji York http://benjiyork.com From ncoghlan at gmail.com Thu May 10 16:08:44 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 May 2007 00:08:44 +1000 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <464318D0.2000109@benjiyork.com> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> <46424718.20006@benjiyork.com> <20070510012836.16D0D3A4061@sparrow.telecommunity.com> <464318D0.2000109@benjiyork.com> Message-ID: <4643276C.5040100@gmail.com> Benji York wrote: > If the PEP cooperates as well with zope.interface as you suggest, all > will be good in the world. Personally I'd prefer sufficient hooks be > added to the language and these types of things (interfaces, adaptation, > generic functions, etc.) be left to third parties (like yourself) > instead of being canonicalized unnecessarily. My understanding of PJE's PEP is that adding those hooks you mention is essentially what it is about :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From pje at telecommunity.com Thu May 10 17:23:00 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 10 May 2007 11:23:00 -0400 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <4643276C.5040100@gmail.com> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> <46424718.20006@benjiyork.com> <20070510012836.16D0D3A4061@sparrow.telecommunity.com> <464318D0.2000109@benjiyork.com> <4643276C.5040100@gmail.com> Message-ID: <20070510152120.407AB3A4061@sparrow.telecommunity.com> At 12:08 AM 5/11/2007 +1000, Nick Coghlan wrote: >Benji York wrote: > > If the PEP cooperates as well with zope.interface as you suggest, all > > will be good in the world. Personally I'd prefer sufficient hooks be > > added to the language and these types of things (interfaces, adaptation, > > generic functions, etc.) be left to third parties (like yourself) > > instead of being canonicalized unnecessarily. > >My understanding of PJE's PEP is that adding those hooks you mention is >essentially what it is about :) Yes, exactly -- plus a handful of useful default implementations that cover the most common use cases. However, because the hooks themselves are implemented using those default implementations, we can't separate out the implementations and just leave the hooks! There has to be *some* implementation of generic functions, method combination, interfaces, adaptation, and aspects, in order to implement the very hooks by which any replacement implementations would be installed. It would be like trying to provide the idea of metaclasses without having the "type" type implemented. That being the case, one might as well expose the basic functionality for people to use, until/unless their needs require an extended implementation. From pje at telecommunity.com Thu May 10 17:36:51 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 10 May 2007 11:36:51 -0400 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <464318D0.2000109@benjiyork.com> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> <46424718.20006@benjiyork.com> <20070510012836.16D0D3A4061@sparrow.telecommunity.com> <464318D0.2000109@benjiyork.com> Message-ID: <20070510153507.205EB3A4061@sparrow.telecommunity.com> At 09:06 AM 5/10/2007 -0400, Benji York wrote: >I would let Jim speak for himself too, but I prefer to put words in his >mouth. ;) While zope.interface has anemic facilities for "verifying" >interfaces, few people use them, and even then rarely outside of very >simple "does this object look right" when testing. It may have been >believed verification would be a great thing, but it's all but >deprecated at this point. Okay, but that's quite the opposite of what I understand Jeff to be saying in this thread, which is that not only is LBYL good, but that he does it all the time. >>And thus, for all of the use cases you just described, the minimal >>PEP 3124 Interface implementation should do just fine, yes? > >Could be, especially if it allows for adaptation. Yes, it does. In fact, adaptation is pretty much all they're good for, except for specifying argument types: http://python.org/dev/peps/pep-3124/#interfaces-and-adaptation http://python.org/dev/peps/pep-3124/#interfaces-as-type-specifiers >My main intent in piping up was >dispelling the LBYL dispersions about zope.interface. ;) Well, "back in the day", before PyProtocols was written, I discovered PEP 246 adaptation and began trying to convince Jim Fulton that adaptation beat the pants off of using if-then's to do "implements" testing. His argument then, IIRC, was that interface verification was more important. I then went off and wrote PyProtocols in large part (specifically the large documentation part!) to show him what could be done using adaptation as a core concept. >If the PEP cooperates as well with zope.interface as you suggest, all >will be good in the world. Personally I'd prefer sufficient hooks be >added to the language and these types of things (interfaces, adaptation, >generic functions, etc.) be left to third parties (like yourself) >instead of being canonicalized unnecessarily. Well, as Nick Coghlan's already pointed out, the PEP is mostly about creating a standard set of hooks, so that each framework doesn't have to reinvent decorators and syntax. >>Indeed, ABCs would work for those use cases too, if you didn't need >>adaptation. Or am I missing something? > >The main advantage I see to zope.interface is adaptation. Other >than that, the fact that the inheritance and interface hierarchies >aren't mixed. In which case, you might well be happy with PEP 3124 interfaces, unless you want to use instance-specific interfaces a lot. >I would turn the argument around and assert that interfaces can be used >for the rare LBYL uses that ABCs appear to be aimed at, as well as more >interesting things. Sure, which is another reason why PEP 3124 includes them. From pje at telecommunity.com Thu May 10 17:40:32 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 10 May 2007 11:40:32 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <740c3aec0705100340w2f50ef4ex32a212a7949c8c7a@mail.gmail.co m> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <740c3aec0705100340w2f50ef4ex32a212a7949c8c7a@mail.gmail.com> Message-ID: <20070510153851.691673A40A0@sparrow.telecommunity.com> At 12:40 PM 5/10/2007 +0200, BJ?rn Lindqvist wrote: >I could use the debug_it @before-function, but I don't think I should >have to just to see the control flow of a darn flatten function. The >other approach would be to grep the whole source for "flatten." Then I >should be able to figure out which dispatch rules are active when the >snippet in bar.py is invoked. But it would require considerable work. Or, you could simply print out the contents of RuleSet(flatten). Or perhaps just the subset of those rules that you're interested in. From pje at telecommunity.com Thu May 10 17:47:19 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 10 May 2007 11:47:19 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <46428622.7000204@canterbury.ac.nz> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <46428622.7000204@canterbury.ac.nz> Message-ID: <20070510154540.8CEC23A4061@sparrow.telecommunity.com> At 02:40 PM 5/10/2007 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > > For > > example, objects to be documented with pydoc currently have to > > reverse engineer a bunch of inspection code, while in a GF-based > > design they'd just add methods. > >There's a problem with this that I haven't seen a good >answer to yet. To add a method to a generic function, >you have to import the module that defines the base >function. So any module that wants its objects documented >in a custom way ends up depending on pydoc. Using the "Importing" package from the Cheeseshop: def register_pydoc(pydoc): @when(pydoc.get_signature) def signature_for_mytype(ob:MyType) # etc. @when(pydoc.get_contents) def contents_for_mytype(ob:MyType) # etc. from peak.util.imports import whenImported whenImported('pydoc', register_pydoc) I certainly wouldn't object to making 'whenImported' and its friends a part of the stdlib. >There's also the possibility that other documentation >systems could make use of the same protocol if it's >designed appropriately, whereas extending pydoc-defined >generic functions benefits pydoc and nothing else. Of course; it's actually somewhat more likely that the basic GFs should actually live in "inspect" (or something like it) rather than in "pydoc" per se. From pje at telecommunity.com Thu May 10 17:59:41 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 10 May 2007 11:59:41 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070509213643.A37643A4061@sparrow.telecommunity.com> Message-ID: <20070510155756.2DEB73A4061@sparrow.telecommunity.com> At 08:43 PM 5/9/2007 -0600, Steven Bethard wrote: >>Meanwhile, leaving in the ability to have method combination later, >>but removing the actual implementation of the @before/around/after >>decorators in place would delete a total of less than 40 non-blank >>lines of code. > >Sure, but it would also delete huge chunks of explanation about >something which really isn't the core of the PEP. Python got >decorators without the 6 lines of functools.update_wrapper -- I see >this as being roughly the same. In particular, >functools.update_wrapper was never mentioned in PEP 318. I see this as being more analagous to contextlib.contextmanager and PEP 343, myself. :) >I'm just hoping you can find a way to cut the PEP >down enough so that folks have a chance of wrapping their head around >it. Well, it's a bit like new-style types, in that there are a bunch of pieces that go together, i.e., descriptors, metaclasses, slots, and mro. I could certainly split the PEP into separate documents, but it might give the impression that the parts are more separable than they are. > ;-) I really do think something along these lines >(overloading/generic functions) is right for Python. I just think the >current PEP is too overwhelming for people to see that. Yeah, and the dilemma is that if I go back and add in all the examples and clarifications that have come up in these threads, it's going to be even bigger. Ditto for when I actually document the extension API part. The PEP is already 50% larger (in text line count) than the implementation of most of its features! (And the implementation already includes a bunch of the extension API.) I'm certainly open to suggestions as to how best to proceed; I just don't see how, for example, to explain the PEP's interfaces without reference to generic functions. So, even if it was split into different documents, you'd still have to read them in much the same order as in the one large document. By the way, I have gotten off-list notes of encouragement from a number of people who've said they hope the PEP makes it, so evidently it's not overwhelming to everyone. Unfortunately, it seems to be suffering a bit from Usenet Nod Syndrome among the people who are in favor of it. From pje at telecommunity.com Thu May 10 18:16:02 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 10 May 2007 12:16:02 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <464289C8.4080004@canterbury.ac.nz> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> Message-ID: <20070510161417.192943A4061@sparrow.telecommunity.com> At 02:56 PM 5/10/2007 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > > > Which is an excellent demonstration, by the way, of another reason > > why before/after methods are useful. They're all *always* called > > before and after the primary methods, regardless of how many of them > > were registered. > >But unless I'm mistaken, ClassC can still take over the >whole show using a method that doesn't call the next >method. No, because you're still thinking of "before" and "after" as if they were syntax sugar for normal method chaining. As I said above (and in the PEP), *all* before and after methods are always called, unless an exception is raised somewhere along the way. This is one of the reasons they're useful to have, in addition to normal and "around" methods. > > debug = Debug.make_decorator('debug') > > always_overrides(Debug, Around) > > always_overrides(Debug, Method) > > always_overrides(Debug, Before) > > always_overrides(Debug, After) > >This is getting seriously brain-twisting. Are you saying >that this somehow overrides the subclass relationships, >so that an @Debug method for ClassA always gets called >before other methods, even ones for ClassC? Just like all the Around methods are always called before the before, after, and primary methods, and just like all the before methods are always called before the primary and after methods, etc. This was all explicitly spelled out in the PEP: ``@before`` and ``@after`` methods are invoked either before or after the main function body, and are *never considered ambiguous*. That is, it will not cause any errors to have multiple "before" or "after" methods with identical or overlapping signatures. Ambiguities are resolved using the order in which the methods were added to the target function. "Before" methods are invoked most-specific method first, with ambiguous methods being executed in the order they were added. All "before" methods are called before any of the function's "primary" methods (i.e. normal ``@overload`` methods) are executed. "After" methods are invoked in the *reverse* order, after all of the function's "primary" methods are executed. That is, they are executed least-specific methods first, with ambiguous methods being executed in the reverse of the order in which they were added. In particular, note the last sentence of the second paragraph, and the first sentence of the third paragraph. >Also, you still can't completely win, as someone could >define an @UtterlySelfish decorator that takes precedence >over your @Debug decorator. So? Maybe that's what they *want*. Sounds like "consenting adults" to me. >For that matter, what if there is simply another >decorator @Foo that is defined to always_override >@Around? The precedence between that and your >@Debug decorator then appears to be undefined. If so, then you'll get an AmbiguousMethods error (either when defining the function or calling it) and thus be informed that you need another override declaration. From jjb5 at cornell.edu Thu May 10 18:23:11 2007 From: jjb5 at cornell.edu (Joel Bender) Date: Thu, 10 May 2007 12:23:11 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070509205655.622A63A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> Message-ID: <464346EF.1080408@cornell.edu> > @before(do_stuff) > def debug_it(ob: ClassC): > import pdb > pdb.set_trace() This is probably far fetched, but I would much rather see: before do_stuff(ob: ClassC): import pbd pdb.set_trace() So the keyword 'before' and 'after' are just like 'def', they define functions with a particular signature that get inserted into the "which function to call" execution sequence. I would want to be able to reference functions within classes as well: >>> class A: ... def f(self, x): ... print 'A.f, x =', x ... >>> z = A() >>> z.f(1) A.f, x = 1 >>> >>> before A.f(self, x): ... print 'yo!' ... >>> z.f(2) yo! A.f, x = 2 Could the sequence of opcodes for the 'before f()' get mushed into the front of the existing code for f()? That would mean that changes to 'x' would be reflected in the original f(): >>> before A.f(self, x): ... print 'doubled' ... x = x * 2 >>> z.f(3) doubled yo! A.f, x = 6 And does a 'return' statement from a before short-circuit the call, or should it mean the same thing as falling off the end? Joel From tjreedy at udel.edu Thu May 10 20:11:56 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 10 May 2007 14:11:56 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com><20070509213643.A37643A4061@sparrow.telecommunity.com> <20070510155756.2DEB73A4061@sparrow.telecommunity.com> Message-ID: "Phillip J. Eby" wrote in message news:20070510155756.2DEB73A4061 at sparrow.telecommunity.com... | At 08:43 PM 5/9/2007 -0600, Steven Bethard wrote: | Yeah, and the dilemma is that if I go back and add in all the | examples and clarifications that have come up in these threads, it's | going to be even bigger. Ditto for when I actually document the | extension API part. The PEP is already 50% larger (in text line | count) than the implementation of most of its features! (And the | implementation already includes a bunch of the extension API.) | | I'm certainly open to suggestions as to how best to proceed; I just | don't see how, for example, to explain the PEP's interfaces without | reference to generic functions. So, even if it was split into | different documents, you'd still have to read them in much the same | order as in the one large document. Without having read the PEP itself (yet), as opposed to numerous posts here, I would note that many PEPs are shortened by referencing other docs. (The class decorater pep being an extreme example.) This makes easier to get an 'executive overview' of the proposal is one is not interested in the details. I will try to take a look in the next week. | By the way, I have gotten off-list notes of encouragement from a | number of people who've said they hope the PEP makes it, so evidently | it's not overwhelming to everyone. Unfortunately, it seems to be | suffering a bit from Usenet Nod Syndrome among the people who are in | favor of it. I don't see an immediate personal use for either ABCs or your generic function machinery. But to me, GFs seem at least as much in the spirit of Python as ABCs. So here is a public probably yes nod from me;-) Terry Jan Reedy From p.f.moore at gmail.com Thu May 10 20:50:08 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 10 May 2007 19:50:08 +0100 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070510155756.2DEB73A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070509213643.A37643A4061@sparrow.telecommunity.com> <20070510155756.2DEB73A4061@sparrow.telecommunity.com> Message-ID: <79990c6b0705101150x4de890d6wc5d01c434a4e6d48@mail.gmail.com> On 10/05/07, Phillip J. Eby wrote: > By the way, I have gotten off-list notes of encouragement from a > number of people who've said they hope the PEP makes it, so evidently > it's not overwhelming to everyone. Unfortunately, it seems to be > suffering a bit from Usenet Nod Syndrome among the people who are in > favor of it. I'll add my public +1 here, then. OTOH, I do find the PEP to be too long and fairly hard to follow - as an example, the bit in the PEP saying """ (to avoid copying the implementation): from overloading import RuleSet RuleSet(flatten).copy_rules((basestring,), (MyString,)) """ adds nothing to the point, but adds complexity which I suspect will simply put people off. OK, it's only 3 lines, but I believe that removing them will substantially improve the impact of that section, and lose nothing of importance. I'm sure there's more of the same, as well. As I say, though, I am in favour of the idea - don't mistake criticism of the way it's presented as dislike of the concept! Paul. From greg.ewing at canterbury.ac.nz Thu May 10 23:39:56 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 11 May 2007 09:39:56 +1200 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <20070510152120.407AB3A4061@sparrow.telecommunity.com> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> <46424718.20006@benjiyork.com> <20070510012836.16D0D3A4061@sparrow.telecommunity.com> <464318D0.2000109@benjiyork.com> <4643276C.5040100@gmail.com> <20070510152120.407AB3A4061@sparrow.telecommunity.com> Message-ID: <4643912C.1050208@canterbury.ac.nz> Phillip J. Eby wrote: > However, because the hooks themselves are implemented using those > default implementations, we can't separate out the implementations > and just leave the hooks! What this seems to mean is that the necessary hooks are already there. -- Greg From greg.ewing at canterbury.ac.nz Thu May 10 23:59:07 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 11 May 2007 09:59:07 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070510161417.192943A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> Message-ID: <464395AB.6040505@canterbury.ac.nz> Phillip J. Eby wrote: > As I said above (and in > the PEP), *all* before and after methods are always called, unless an > exception is raised somewhere along the way. > "Before" methods are invoked most-specific method first, with > ambiguous methods being executed in the order they were added. All > "before" methods are called before any of the function's "primary" > methods (i.e. normal ``@overload`` methods) are executed. Well, it wasn't clear to me at all from the PEP that this is how it works. The above paragraph doesn't say anything about @around methods, for example, and it's not obvious whether they should be considered "normal" or "primary". >> For that matter, what if there is simply another >> decorator @Foo that is defined to always_override >> @Around? The precedence between that and your >> @Debug decorator then appears to be undefined. > > If so, then you'll get an AmbiguousMethods error (either when defining > the function or calling it) and thus be informed that you need another > override declaration. I can see a problem with this. If Library1 defines a method that always overrides an @around method, and Library2 does the same thing, then if I try to use both libraries at the same time, I'll get an exception that I don't know the cause of and don't have any idea how to fix. -- Greg From guido at python.org Fri May 11 01:00:00 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 10 May 2007 16:00:00 -0700 Subject: [Python-3000] PEPs update Message-ID: I've accepted some PEPs: SA 3120 Using UTF-8 as the default source encoding von L?wis SA 3121 Extension Module Initialization & Finalization von L?wis SA 3123 Making PyObject_HEAD conform to standard C von L?wis SA 3127 Integer Literal Support and Syntax Maupin SA 3129 Class Decorators Winter (3129 is listed for completeness, I think it was already approved a few days ago.) and rejected some others: SR 3125 Remove Backslash Continuation Jewett SR 3126 Remove Implicit String Concatenation Jewett SR 3130 Access to Current Module/Class/Function Jewett I'm looking forward to seeing these implemented in the p3yk branch (and a few others that have been accepted for a while now, e.g. 3109, 3110, 3113). Other status updates: 3101 (string formatting) -- Talin will continue to shepherd this in cooperation with the authors of the sandbox implementation. 3116 (new I/O) -- I'm slowly chipping away at implementing this. The PEP is behind in tracking the actual implementation. 3119, 3124, 3141 (ABCs, GFs) -- I'm still thinking about this; 3119 and 3141 are still awaiting major rewrites and 3124 is under heavy discussion. 3118 (buffer protocol) -- This is long, but I trust Travis. Maybe he should just submit an implementation (hint, hint). 3128 (BList) -- I'll leave this for Raymond Hettinger to review. 3131 (non-ASCII identifiers) -- I'm leaning towards rejecting. 3132 (extended iterable unpacking) -- I'm leaning towards accepting. I'm still hoping Raymond will check his draft PEPs in by Sunday night. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Fri May 11 01:20:26 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 10 May 2007 19:20:26 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <464395AB.6040505@canterbury.ac.nz> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> Message-ID: <20070510231845.9C98C3A4061@sparrow.telecommunity.com> At 09:59 AM 5/11/2007 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: >>As I said above (and in the PEP), *all* before and after methods >>are always called, unless an exception is raised somewhere along the way. > >> "Before" methods are invoked most-specific method first, with >> ambiguous methods being executed in the order they were added. All >> "before" methods are called before any of the function's "primary" >> methods (i.e. normal ``@overload`` methods) are executed. > >Well, it wasn't clear to me at all from the PEP that >this is how it works. The above paragraph doesn't say >anything about @around methods, for example, That's because @around methods haven't been introduced at that point in the PEP; the following section introduces @around and explains that @arounds are called before the befores, etc. For example, part of the section describing @around methods says: The ``__proceed__`` given to an "around" method will either be the next applicable "around" method, a ``DispatchError`` instance, or a synthetic method object that will call all the "before" methods, followed by the primary method chain, followed by all the "after" methods, and return the result from the primary method chain. Of course, it's also stated in the PEP that it's basically copying CLOS's "standard method combination", but I really should add appropriate references for that. >>>For that matter, what if there is simply another >>>decorator @Foo that is defined to always_override >>>@Around? The precedence between that and your >>>@Debug decorator then appears to be undefined. >>If so, then you'll get an AmbiguousMethods error (either when >>defining the function or calling it) and thus be informed that you >>need another override declaration. > >I can see a problem with this. If Library1 defines a >method that always overrides an @around method, and >Library2 does the same thing, then if I try to use >both libraries at the same time, I'll get an exception >that I don't know the cause of and don't have any >idea how to fix. Actually, that would require that Library1 and Library2 both add methods to a generic function in Library3. Not only that, but *those methods would have to apply to the same classes*. So, it's actually a lot harder to create that situation than it sounds. In particular, notice that if Library1 only uses its combinators for methods applying to its own types, and Library2 does the same, they *cannot* create any method ambiguity in the third library's generic functions! Of course, outside of debug hooks, adding custom combinators to somebody else's generic function probably isn't a very good idea in the first place, at least for instances of *that library's types* or of *built-in types* -- which is the only way to produce a conflict between two libraries that don't otherwise know about each other. (That is, if L1 and L2 don't know each other, they can hardly be registering methods for each other's types with a common generic function.) Meanwhile, in CLOS the set of allowed qualifiers for a generic function's methods is decided by the function itself, so there's no way for you to add foreign method types at all. I personally think that's a little too restrictive, though, as it not only goes against "consenting adults", but it also effectively rules out "true" aspect-oriented programming. By the way, I feel I should mention that although I disagree with a lot of your arguments, I *do* appreciate your taking the time to find possible edge or corner failure conditions and "unintended consequences". So please don't let my poking of holes in your poking of holes in the PEP, stop you from trying to poke more holes in it. :) From python at rcn.com Fri May 11 01:27:57 2007 From: python at rcn.com (Raymond Hettinger) Date: Thu, 10 May 2007 19:27:57 -0400 (EDT) Subject: [Python-3000] PEPs update Message-ID: <20070510192757.BIU44325@ms09.lnh.mail.rcn.net> > and rejected some others: > SR 3126 Remove Implicit String Concatenation Jewett I had high hopes for this one. Cie le vie. I did not see octal literals on your list. FWIW, I'm -1 on the proposal. The current situation is only a minor nuisance. While I prefer to see octal literal support dropped entirely, I would rather live with the 0123 form than add more complexity with a non-standard format and a set of warnings for decimals with leading zeros: date(2007, 05, 09). > 3128 (BList) -- I'll leave this for Raymond Hettinger to review. After looking at the source, I think this has almost zero chance for replacing list(). There is too much value in a simple C API, low space overhead for small lists, good performance is common use cases, and having performance that is easily understood. The BList implementation lacks these virtues and trades-off a little performance is common cases for much better performance in uncommon cases. As a Py3.0 PEP, I think it can be rejected. Depending on its success as a third-party module, it still has a chance for inclusion in the collections module. The essential criteria for that is whether it is a superior choice for some real-world use cases. I've scanned my own code and found no instances where BList would have been preferable to a regular list. However, that scan has a selection bias because it doesn't reflect what I would have written had BList been available. So, after a few months, I intend to poll comp.lang.python for BList success stories. If they exist, then I have no problem with inclusion in the collections module. After all, its learning curve is near zero -- the only cost is the clutter factor stemming from indecision about the most appropriate data structure for a given task. > I'm still hoping Raymond will check his draft PEPs in by Sunday night. Sorry for the delay, I've been fully task saturated and the PEP writing has been slowed by the need to explore the ideas more fully. The PEP for eliminating __del__ seemed straight-forward at the outset, but the use case you presented doesn't seem to have a clean substitute (as it requires the object to be alive to finalize it). Other use cases do have a clean solution. So, I'll go forward with the PEP but am a bit disheartened that it is going to have to advise try/finally or somesuch for the harder cases. The information attributes idea is going well and is sticking close to the original presentation except that I've now seen the wisdom of modifying isinstance() to let some objects fake or proxy a type that they don't inherit from. Raymond From steven.bethard at gmail.com Fri May 11 01:38:33 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Thu, 10 May 2007 17:38:33 -0600 Subject: [Python-3000] PEPs update In-Reply-To: <20070510192757.BIU44325@ms09.lnh.mail.rcn.net> References: <20070510192757.BIU44325@ms09.lnh.mail.rcn.net> Message-ID: On 5/10/07, Raymond Hettinger wrote: > The PEP for eliminating __del__ seemed straight-forward > at the outset, but the use case you presented doesn't > seem to have a clean substitute (as it requires the object > to be alive to finalize it). Other use cases do have a clean > solution. So, I'll go forward with the PEP but am a bit > disheartened that it is going to have to advise try/finally > or somesuch for the harder cases. You've probably already seen these, but just in case you haven't, there have been two alternatives to __del__ posted recently to the cookbook: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/519621 http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/519610 Of course, they have their own sets of problems, but maybe there's something worthwhile in there for the PEP. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Fri May 11 01:39:11 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 10 May 2007 16:39:11 -0700 Subject: [Python-3000] PEPs update In-Reply-To: <20070510192757.BIU44325@ms09.lnh.mail.rcn.net> References: <20070510192757.BIU44325@ms09.lnh.mail.rcn.net> Message-ID: On 5/10/07, Raymond Hettinger wrote: > I did not see octal literals on your list. FWIW, I'm -1 on the proposal. The current situation is only a minor nuisance. While I prefer to see octal literal support dropped entirely, I would rather live with the 0123 form than add more complexity with a non-standard format and a set of warnings for decimals with leading zeros: date(2007, 05, 09). It was on the list, accepted: > SA 3127 Integer Literal Support and Syntax Maupin > > 3128 (BList) -- I'll leave this for Raymond Hettinger to review. > > After looking at the source, I think this has almost zero chance for replacing list(). There is too much value in a simple C API, low space overhead for small lists, good performance is common use cases, and having performance that is easily understood. The BList implementation lacks these virtues and trades-off a little performance is common cases for much better performance in uncommon cases. As a Py3.0 PEP, I think it can be rejected. OK, will do. I'll quote you in the rejection notice. > Depending on its success as a third-party module, it still has a chance for inclusion in the collections module. The essential criteria for that is whether it is a superior choice for some real-world use cases. I've scanned my own code and found no instances where BList would have been preferable to a regular list. However, that scan has a selection bias because it doesn't reflect what I would have written had BList been available. So, after a few months, I intend to poll comp.lang.python for BList success stories. If they exist, then I have no problem with inclusion in the collections module. After all, its learning curve is near zero -- the only cost is the clutter factor stemming from indecision about the most appropriate data structure for a given task. > > > I'm still hoping Raymond will check his draft PEPs in by Sunday night. > > Sorry for the delay, I've been fully task saturated and the PEP writing has been slowed by the need to explore the ideas more fully. > > The PEP for eliminating __del__ seemed straight-forward at the outset, but the use case you presented doesn't seem to have a clean substitute (as it requires the object to be alive to finalize it). Other use cases do have a clean solution. So, I'll go forward with the PEP but am a bit disheartened that it is going to have to advise try/finally or somesuch for the harder cases. > > The information attributes idea is going well and is sticking close to the original presentation except that I've now seen the wisdom of modifying isinstance() to let some objects fake or proxy a type that they don't inherit from. Cool. FWIW, the rewrite of PEP 3119 will focus mostly on overloading isinstance() and issubclass() and a few examples of what can be done with these. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Fri May 11 01:54:58 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 10 May 2007 19:54:58 -0400 Subject: [Python-3000] __del__ (was Re: PEPs update) In-Reply-To: <20070510192757.BIU44325@ms09.lnh.mail.rcn.net> References: <20070510192757.BIU44325@ms09.lnh.mail.rcn.net> Message-ID: <20070510235312.21F463A4061@sparrow.telecommunity.com> At 07:27 PM 5/10/2007 -0400, Raymond Hettinger wrote: >The PEP for eliminating __del__ seemed straight-forward at the >outset, but the use case you presented doesn't seem to have a clean >substitute (as it requires the object to be alive to finalize >it). Other use cases do have a clean solution. So, I'll go forward >with the PEP but am a bit disheartened that it is going to have to >advise try/finally or somesuch for the harder cases. By the way - another issue with removing __del__ is that try/finally in generators (PEP 342) is implemented using it. Which means that if you took away __del__ from the Python level, you could still simulate it by saving a reference to a running generator with a finally clause. Of course, that would have at least as many problems as using __del__ directly, but there you go. :) From greg.ewing at canterbury.ac.nz Fri May 11 03:20:52 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 11 May 2007 13:20:52 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070510231845.9C98C3A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> Message-ID: <4643C4F4.30708@canterbury.ac.nz> Phillip J. Eby wrote: > That's because @around methods haven't been introduced at that point in > the PEP; the following section introduces @around and explains that > @arounds are called before the befores, etc. Hmm, so it's not the case that @before methods are called "before" all other methods. That makes it even more confusing. I'm now even more of the opinion that this is too complicated for Python's first generic function system. "If it's hard to explain, it's probably a bad idea." > Of course, it's also stated in the PEP that it's basically copying > CLOS's "standard method combination", but I really should add > appropriate references for that. Relying on people knowing about CLOS in order to follow this stuff doesn't seem like a good idea. -- Greg From jimjjewett at gmail.com Fri May 11 03:35:37 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 10 May 2007 21:35:37 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070510154540.8CEC23A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <46428622.7000204@canterbury.ac.nz> <20070510154540.8CEC23A4061@sparrow.telecommunity.com> Message-ID: On 5/10/07, Phillip J. Eby wrote: > Using the "Importing" package from the Cheeseshop: ... > from peak.util.imports import whenImported > whenImported('pydoc', register_pydoc) > I certainly wouldn't object to making 'whenImported' and its friends > a part of the stdlib. Adding whenImported would be useful, even outside of ABCs and generic functions. But please don't go overboard with the "and its friends" part. That 15K zip file boiled down to a 370 line python module. Over 200 of those lines were to support things like module inheritance or returning a sequence with strings replaced by the result of running import/getattr on them. Those uses are probably too obscure for the stdlib. -jJ From pje at telecommunity.com Fri May 11 04:05:22 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 10 May 2007 22:05:22 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <46428622.7000204@canterbury.ac.nz> <20070510154540.8CEC23A4061@sparrow.telecommunity.com> Message-ID: <20070511020339.5F4313A4061@sparrow.telecommunity.com> At 09:35 PM 5/10/2007 -0400, Jim Jewett wrote: >On 5/10/07, Phillip J. Eby wrote: > >>Using the "Importing" package from the Cheeseshop: >... >>from peak.util.imports import whenImported >>whenImported('pydoc', register_pydoc) > >>I certainly wouldn't object to making 'whenImported' and its friends >>a part of the stdlib. > >Adding whenImported would be useful, even outside of ABCs and generic >functions. > >But please don't go overboard with the "and its friends" part. That >15K zip file boiled down to a 370 line python module. Over 200 of >those lines were to support things like module inheritance Actually, the part that deals with module inheritance is 16 lines, unless you count the documentation for it, which is another 22 lines. But I have no problem leaving that out of the stdlib; module inheritance is deprecated even in PEAK. >or >returning a sequence with strings replaced by the result of running >import/getattr on them. Those uses are probably too obscure for the >stdlib. If you mean importObject, importSequence, and importSuite, I agree with you. Really, by "and friends" I mean importString and lazyModule, and I'm fine with relocating and renaming them, as well as stripping out the relative path bit. From jimjjewett at gmail.com Fri May 11 06:05:55 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 11 May 2007 00:05:55 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070511020339.5F4313A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <46428622.7000204@canterbury.ac.nz> <20070510154540.8CEC23A4061@sparrow.telecommunity.com> <20070511020339.5F4313A4061@sparrow.telecommunity.com> Message-ID: On 5/10/07, Phillip J. Eby wrote: > At 09:35 PM 5/10/2007 -0400, Jim Jewett wrote: > >Adding whenImported would be useful, even outside of ABCs and > >generic functions. > >But please don't go overboard with the "and its friends" part. > If you mean importObject, importSequence, and importSuite, I agree > with you. > Really, by "and friends" I mean importString and lazyModule, and I'm > fine with relocating and renaming them, as well as stripping out the > relative path bit. So we're mostly in agreement, but I had also wanted to leave out importString. I know it can seem simpler to treat everything as an object, and not worry about where the type switches from package to module to instance to attribute. I see it used in Twisted. But I'm not sure it is *really* simpler for someone who isn't familiar with your codebase, and I don't see why it is needed for whenImported. -jJ From p.f.moore at gmail.com Fri May 11 10:40:27 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 11 May 2007 09:40:27 +0100 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <4643C4F4.30708@canterbury.ac.nz> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> Message-ID: <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> On 11/05/07, Greg Ewing wrote: > I'm now even more of the opinion that this is too > complicated for Python's first generic function system. > "If it's hard to explain, it's probably a bad idea." Hmm. My view is that it *is* simple to explain, but unfortunately Phillip's explanation in the PEP is not that simple explanation :-( In my view, too much of the PEP is taken up with edge cases, relatively obscure specialist uses, and unnecessary explanations of implementation details. However, I haven't had any time recently to review it in enough detail to offer a concrete proposal on how to simplify it, so I've kept quiet so far. I would argue that the PEP could be *very* simple if it restricted itself to the basic idea. Much of what is being discussed is, in my view, implementation detail - which Phillip finds compelling because it shows the power of the basic approach, but which is turning others off because it's more complex and subtle than a basic use case. There are many features in Python which are powerful and simple on the surface, but get quite gory when you delve beneath the covers (new-style classes, decorators, generators, for example). That doesn't mean they shouldn't be there. Paul. From jimjjewett at gmail.com Fri May 11 15:46:19 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 11 May 2007 09:46:19 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070510231845.9C98C3A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> Message-ID: On 5/10/07, Phillip J. Eby wrote: > At 09:59 AM 5/11/2007 +1200, Greg Ewing wrote: > >Phillip J. Eby wrote: > >>As I said above (and in the PEP), *all* before and after methods > >>are always called, unless an exception is raised somewhere along the way. > >> "Before" methods are invoked most-specific method first, with > >> ambiguous methods being executed in the order they were added. All > >> "before" methods are called before any of the function's "primary" > >> methods (i.e. normal ``@overload`` methods) are executed. As much as it seems clear once you understand ... it isn't, if only because it is so unexpected. I think it needs an example, such as class A: ... class B(A): ... Then register before/after/around/normal methods for each, and show the execution path for a B(). As I understand it now (without rereading the PEP) AroundB part 1 AroundA part 1 BeforeA BeforeB NormalB # NormalA gets skipped, unless NormalB calls it explicitly AfterA AfterB AroundA part 2 AroundB part 2 But maybe it would just be AroundB, because an Around is really a replacement? > >I can see a problem with this. If Library1 defines a > >method that always overrides an @around method, and > >Library2 does the same thing, then if I try to use > >both libraries at the same time, I'll get an exception > >that I don't know the cause of and don't have any > >idea how to fix. > Actually, that would require that Library1 and Library2 both add > methods to a generic function in Library3. Not only that, but *those > methods would have to apply to the same classes*. So, it's actually > a lot harder to create that situation than it sounds. > In particular, notice that if Library1 only uses its combinators for > methods applying to its own types, and Library2 does the same, they > *cannot* create any method ambiguity in the third library's generic > functions! Library 1 and Library 2 both register Sage classes with Numpy, or vice versa. Library 1 and 2 don't know about each other. Library 1 and 2 also go through some extra version skew pains when Sage starts registering its types itself. hmm... if Library 2 is slightly buggy, or makes a slightly different mapping than library 1, then my getting correct results will depend on which of Library 1/Library 2 gets imported first -- or, rather, first got to the registration stage of their being imported. -jJ From daniel at stutzbachenterprises.com Fri May 11 17:00:46 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Fri, 11 May 2007 10:00:46 -0500 Subject: [Python-3000] PEPs update In-Reply-To: <20070510192757.BIU44325@ms09.lnh.mail.rcn.net> References: <20070510192757.BIU44325@ms09.lnh.mail.rcn.net> Message-ID: On 5/10/07, Raymond Hettinger wrote: > > 3128 (BList) -- I'll leave this for Raymond Hettinger to review. > > After looking at the source, I think this has almost zero chance for replacing > list(). There is too much value in a simple C API, low space overhead for small lists, Thanks for taking time to review my code. Did you look through the PEP as well? Both of these issues were specifically addressed. In fact, I am half way done with implementing the change so that small BLists are memory efficient. > good performance is common use cases, This is also addressed, to some extent, in the PEP. > and having performance that is easily understood. I am not sure what aspect of the performance might be misunderstood. Just about everything is O(log n). Could you clarify your concern? > The BList implementation lacks these virtues and trades-off a little performance > is common cases for much better performance in uncommon cases. As a Py3.0 > PEP, I think it can be rejected. Would it be useful if I created an experimental fork of 2.5 that replaces array-based lists with BLists, so that the performance penalty (if any) on existing code can be measured? > Depending on its success as a third-party module, it still has a chance for > inclusion in the collections module. The essential criteria for that is whether > it is a superior choice for some real-world use cases. I've scanned my own > code and found no instances where BList would have been preferable to a > regular list. However, that scan has a selection bias because it doesn't reflect > what I would have written had BList been available. Indeed, I wrote the BList because there were idioms that I wanted to use that were just not practical with an array-based list. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From python at rcn.com Fri May 11 18:22:57 2007 From: python at rcn.com (Raymond Hettinger) Date: Fri, 11 May 2007 12:22:57 -0400 (EDT) Subject: [Python-3000] PEPs update Message-ID: <20070511122257.BIW67893@ms09.lnh.mail.rcn.net> > Thanks for taking time to review my code. You welcome. And thanks for the continuing development effort. > Did you look through the PEP as well? Yes. > In fact, I am half way done with implementing the change so > that small BLists are memory efficient. As the code continues to evolve, I'll continue to look at it. I look forward to seeing how far you can take this. Newly developed code always faces an uphill battle when compared to mature open-source. > I am not sure what aspect of the performance might > be misunderstood. Just about everything is O(log n). > Could you clarify your concern? End-users (everyday Python programmers) need to be understand the performance intuitively and have a clear understanding of what is going on under-the-hood. Our existing data structures have the virtue of having a simple mental model (except for aspects of re-sizing and over-allocation which are a bit obscure). > Would it be useful if I created an experimental fork of 2.5 > that replaces array-based lists with BLists, > so that the performance penalty (if any) on existing code > can be measured? That would likely be an informative exercise and would assure that your code is truly interchangable with regular lists. It would also highlight the under-the-hood difficulties you'll encounter with the C-API. That being said, it is a labor intensive exercise and the time might be better spent on tweaking the third-party module code and building a happy user-base. > Indeed, I wrote the BList because there were idioms that I > wanted to use that were just not practical with an array-based list. We ought to set up a page on the wiki for success stories with blist as a third-party module. In time, the Right Answer (tm) will become self-evident. Raymond From g.brandl at gmx.net Fri May 11 17:49:14 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 11 May 2007 17:49:14 +0200 Subject: [Python-3000] PEP 3132: Extended Iterable Unpacking In-Reply-To: References: Message-ID: Georg Brandl schrieb: > This is a bit late, but it was in my queue by April 30, I swear! ;) > Comments are appreciated, especially some phrasing sounds very clumsy > to me, but I couldn't find a better one. This was now accepted by Guido and checked in. Thanks for all the comments! Georg From pje at telecommunity.com Fri May 11 18:29:48 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 11 May 2007 12:29:48 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> Message-ID: <20070511162803.5E70F3A4061@sparrow.telecommunity.com> At 09:46 AM 5/11/2007 -0400, Jim Jewett wrote: >As much as it seems clear once you understand ... it isn't, if only >because it is so unexpected. I think it needs an example, such as > > class A: ... > class B(A): ... > >Then register before/after/around/normal methods for each, and show >the execution path for a B(). As I understand it now (without >rereading the PEP) > > AroundB part 1 > AroundA part 1 > BeforeA > BeforeB > NormalB > # NormalA gets skipped, unless NormalB calls it explicitly > AfterA > AfterB > AroundA part 2 > AroundB part 2 The above is correct, except that either AroundB or AroundA *may* choose to skip calling the parts they enclose. >But maybe it would just be AroundB, because an Around is really a replacement? If AroundB didn't call its next-method, it would indeed be a replacement. > > >I can see a problem with this. If Library1 defines a > > >method that always overrides an @around method, and > > >Library2 does the same thing, then if I try to use > > >both libraries at the same time, I'll get an exception > > >that I don't know the cause of and don't have any > > >idea how to fix. > > > Actually, that would require that Library1 and Library2 both add > > methods to a generic function in Library3. Not only that, but *those > > methods would have to apply to the same classes*. So, it's actually > > a lot harder to create that situation than it sounds. > > > In particular, notice that if Library1 only uses its combinators for > > methods applying to its own types, and Library2 does the same, they > > *cannot* create any method ambiguity in the third library's generic > > functions! > >Library 1 and Library 2 both register Sage classes with Numpy, or vice >versa. Library 1 and 2 don't know about each other. Library 1 and 2 >also go through some extra version skew pains when Sage starts >registering its types itself. Well, all the more reason to have this in place for 3.0 where everybody is starting over anyway. ;-) Seriously though, it seems to me that registering third-party types in fourth-party generic functions, from *library* code (as opposed to application code) is unwise. I mean, you're already talking about FOUR people there, *not* counting Library 2! (i.e., Sage, Numpy, Library 1, and the user). However, the simple solution is that L1 and L2 should subclass the relevant Sage types and only register their subclasses. Then, they each effectively "own" the types, and if Sage registers useful stuff later, they can just drop their subclasses. That doesn't eliminate the issue of what type(s) the user of L1 and L2 should use, unless of course the use of Sage in at least one of L1 and L2 is embedded and not user-visible. However, it's not like such questions of choice and compatibility don't come up all the time anyway, and the user could, if he/she had to, use multiple inheritance plus some additional registrations of their own to work things out. Also, remember that the user can always resolve ambiguities between libraries by making additional registrations that more specifically apply to the situation. So, can you write a library that messes things up for other people? Sure! But you can already do that; this ain't Java, and we're all consenting adults. If you write libraries that mess stuff up, you're going to get complaints. The best practice here reminds me of a joke my coworkers used to tell when I was in the real estate software business. One of our salespeople was talking to a real estate broker and explaining the menu of our program: "See, if your company lists it, and another company sells it, or if you sell it, that's an "Inside Listing Sold". But if another company lists it, and *you* sell it, then we call that an "Outside Listing Sold"." The broker nodded. "But what if another company lists *and* sells it?" The salesperson thought a moment, then smiled. "Well, we call that, "None of your business!"" In the same way here, you can register your types with other people's generic functions, or other people's types with your generic functions, or even your own types with your own generic functions. But registering other people's types with other people's generic functions is what we would politely call, "none of your business". :) >hmm... if Library 2 is slightly buggy, or makes a slightly different >mapping than library 1, then my getting correct results will depend on >which of Library 1/Library 2 gets imported first -- or, rather, first >got to the registration stage of their being imported. Note that for "@around" and "@when/@overload", import order *does not resolve ambiguity*. If the registrations are for the same types, calling the function with those types will raise an AmbiguousMethods error that lists the conflicting methods. But, as I pointed out above, it's a bad idea for those two libraries to directly register another library's types without subclassing them first, per the NOYB rule. :) From pje at telecommunity.com Fri May 11 18:37:37 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 11 May 2007 12:37:37 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <46428622.7000204@canterbury.ac.nz> <20070510154540.8CEC23A4061@sparrow.telecommunity.com> <20070511020339.5F4313A4061@sparrow.telecommunity.com> Message-ID: <20070511163551.C04D83A4061@sparrow.telecommunity.com> At 12:05 AM 5/11/2007 -0400, Jim Jewett wrote: >So we're mostly in agreement, but I had also wanted to leave out importString. > >I know it can seem simpler to treat everything as an object, and not >worry about where the type switches from package to module to instance >to attribute. I see it used in Twisted. > >But I'm not sure it is *really* simpler for someone who isn't familiar >with your codebase, The use case is to be able to have a string that refers to an importable object. The unittest module has something similar, egg entry points do, and so does mod_python. (I wouldn't be surprised if mod_wsgi has something like that also.) Chandler's repository (object database) also had code to "load classes" by using a string import, before I got there. The thing is, string-import code is tricky to get just right; it therefore seems like a natural for "batteries included" if you're creating a stdlib module that's already doing stuff with strings and importing. >and I don't see why it is needed for whenImported. It isn't. I'm just saying if we were going to add it to the stdlib, importString (perhaps with a name change) just seems like a no-brainer to include. (vs. importObject, importSequence, and importSuite, which are just boilerplate over importString.) Anyway, perhaps this should piggyback on the coming discussion of moving the full import code to Python; it might be that lazy imports and callbacks could be more cleanly implemented as part of that machinery, than by being tacked on afterwards. From steven.bethard at gmail.com Fri May 11 18:57:21 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 11 May 2007 10:57:21 -0600 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: On 4/30/07, Phillip J. Eby wrote: > PEP: 3124 > Title: Overloading, Generic Functions, Interfaces, and Adaptation Ok, one more try at simplifying things. How about you just drop the sections: "Before" and "After" Methods "Around" Methods Custom Combinations Aspects Yes, I know that 90% of the machinery to support these will already be in the module, but I still think it would make a clearer PEP if that remaining 10% was factored out into third-party code. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From pje at telecommunity.com Fri May 11 19:11:53 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 11 May 2007 13:11:53 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> Message-ID: <20070511171007.D27C23A4061@sparrow.telecommunity.com> At 10:57 AM 5/11/2007 -0600, Steven Bethard wrote: >On 4/30/07, Phillip J. Eby wrote: > > PEP: 3124 > > Title: Overloading, Generic Functions, Interfaces, and Adaptation > >Ok, one more try at simplifying things. How about you just drop the sections: > > "Before" and "After" Methods > "Around" Methods > Custom Combinations > Aspects > >Yes, I know that 90% of the machinery to support these will already be >in the module, but I still think it would make a clearer PEP if that >remaining 10% was factored out into third-party code. ISTM that your statement is still true if you replace the phrase "third-party code" with "second PEP". :) From steven.bethard at gmail.com Fri May 11 19:16:31 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 11 May 2007 11:16:31 -0600 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070511171007.D27C23A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070511171007.D27C23A4061@sparrow.telecommunity.com> Message-ID: On 5/11/07, Phillip J. Eby wrote: > At 10:57 AM 5/11/2007 -0600, Steven Bethard wrote: > >On 4/30/07, Phillip J. Eby wrote: > > > PEP: 3124 > > > Title: Overloading, Generic Functions, Interfaces, and Adaptation > > > >Ok, one more try at simplifying things. How about you just drop the sections: > > > > "Before" and "After" Methods > > "Around" Methods > > Custom Combinations > > Aspects > > > >Yes, I know that 90% of the machinery to support these will already be > >in the module, but I still think it would make a clearer PEP if that > >remaining 10% was factored out into third-party code. > > ISTM that your statement is still true if you replace the phrase > "third-party code" with "second PEP". :) That's fine too. =) STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From jimjjewett at gmail.com Fri May 11 19:27:19 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 11 May 2007 13:27:19 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070511162803.5E70F3A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <20070511162803.5E70F3A4061@sparrow.telecommunity.com> Message-ID: On 5/11/07, Phillip J. Eby wrote: > At 09:46 AM 5/11/2007 -0400, Jim Jewett wrote: > >As much as it seems clear once you understand ... it isn't, if only > >because it is so unexpected. I think it needs an example, such as > > class A: ... > > class B(A): ... > >Then register before/after/around/normal methods for each, and show > >the execution path for a B(). As I understand it now (without > >rereading the PEP) > > AroundB part 1 > > AroundA part 1 > > BeforeA > > BeforeB > > NormalB > > # NormalA gets skipped, unless NormalB calls it explicitly > > AfterA > > AfterB > > AroundA part 2 > > AroundB part 2 > The above is correct, except that either AroundB or AroundA *may* > choose to skip calling the parts they enclose. So how is an Around method any different than a full concrete implementation? Just because it has higher precedence, so it can win without being the most specific? Could you drop the precedence stuff from the core library, and just have "here is how to register a concrete implementation" "here is the equivalent of super -- a way to call whatever would have been called without your replacement" I understand that the full version offers more functionality, but it is also more complicated. Maybe use that fuller version as a test case, and mention in the module docs that it is possible to create more powerful dispatch rules, and that there is an example (test\test_generic_reg ?), with even more powerful extensions available as PEAK.rules (http:// ...) and Zope.Interfaces (http://...) > >Library 1 and Library 2 both register Sage classes with Numpy, or vice > >versa. Library 1 and 2 don't know about each other. Library 1 and 2 > >also go through some extra version skew pains when Sage starts > >registering its types itself. > Seriously though, it seems to me that registering third-party types > in fourth-party generic functions, from *library* code (as opposed to > application code) is unwise. I mean, you're already talking about > FOUR people there, *not* counting Library 2! (i.e., Sage, Numpy, > Library 1, and the user). Those are all math libraries; Library 1 and Library 2 *should* both work well with both NumPy and Sage, and can reasonably be considered extensions of both. Saying "You can do this with most numbers, but not NumPy numbers" is ugly. Saying "You can do this, but sometimes it will break because the extensions I work with don't know about each other, and I won't translate, as a matter of policy" is ... probably not going to happen. Ideally, NumPy and Sage would make the introductions directly, or there would at least be a canonical mapping somewhere that Libraries 1 and 2 could agree on ... but that won't happen at any specific time. Saying "You need to upgrade to at least version Package A version 2.3.4 and Package B version 4.3 to use my code" is unlikely to happen; you yourself still support Python 2.2 in your own packages. > anyway, and the user could, if he/she had to, use multiple > inheritance plus some additional registrations of their own to work things out. If there are two registrations for the same selection criteria, how can the user resolve things? Either the first one registered wins, or the second, or the user sees some sort of import failure, and can't fix it without modifying somebody else's code to avoid one of those registrations. -jJ From nnorwitz at gmail.com Fri May 11 20:29:46 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Fri, 11 May 2007 11:29:46 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> Message-ID: On 5/11/07, Paul Moore wrote: > On 11/05/07, Greg Ewing wrote: > > I'm now even more of the opinion that this is too > > complicated for Python's first generic function system. > > "If it's hard to explain, it's probably a bad idea." > > Hmm. My view is that it *is* simple to explain, but unfortunately > Phillip's explanation in the PEP is not that simple explanation :-( [snip] > I would argue that the PEP could be *very* simple if it restricted > itself to the basic idea. Paul, Could you write up the simple version that you would use instead? n From pje at telecommunity.com Fri May 11 20:51:12 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 11 May 2007 14:51:12 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <20070511162803.5E70F3A4061@sparrow.telecommunity.com> Message-ID: <20070511184927.7A2043A4061@sparrow.telecommunity.com> At 01:27 PM 5/11/2007 -0400, Jim Jewett wrote: >So how is an Around method any different than a full concrete >implementation? Just because it has higher precedence, so it can win >without being the most specific? Yep. >Could you drop the precedence stuff from the core library, and just have > >"here is how to register a concrete implementation" >"here is the equivalent of super -- a way to call whatever would have >been called without your replacement" > >I understand that the full version offers more functionality, but it >is also more complicated. Maybe use that fuller version as a test >case, and mention in the module docs that it is possible to create >more powerful dispatch rules, and that there is an example >(test\test_generic_reg ?), with even more powerful extensions >available as PEAK.rules (http:// ...) and Zope.Interfaces >(http://...) I don't have a problem with moving method combination (other than the super()-analogue) and aspects to a separate PEP for ease of understanding, but the implementation is pretty much the smallest indivisible collection of features that still allows features like those to be added. (See the other PEP 3124 threads for that discussion.) I think this is analagous to PEP 252 and 253, in that their implementation is interdependent, but could be considered separate topics and thus easier to read when separated. > > >Library 1 and Library 2 both register Sage classes with Numpy, or vice > > >versa. Library 1 and 2 don't know about each other. Library 1 and 2 > > >also go through some extra version skew pains when Sage starts > > >registering its types itself. > > > Seriously though, it seems to me that registering third-party types > > in fourth-party generic functions, from *library* code (as opposed to > > application code) is unwise. I mean, you're already talking about > > FOUR people there, *not* counting Library 2! (i.e., Sage, Numpy, > > Library 1, and the user). > >Those are all math libraries; Library 1 and Library 2 *should* both >work well with both NumPy and Sage, and can reasonably be considered >extensions of both. > >Saying "You can do this with most numbers, but not NumPy numbers" is ugly. > >Saying "You can do this, but sometimes it will break because the >extensions I work with don't know about each other, and I won't >translate, as a matter of policy" is ... probably not going to happen. If L1 defines a generic function, it's fine for it to register Sage and NumPy types for it. But if NumPy defines a generic function and Sage defines the type, how is it any of L1's business? >Ideally, NumPy and Sage would make the introductions directly, or >there would at least be a canonical mapping somewhere that Libraries 1 >and 2 could agree on ... but that won't happen at any specific time. Again, nothing stops L1 and L2 from subclassing those types, and registering only those subtypes. Or from offering *optional* registration support modules, so an application can *choose* to import them. NOYB ("none of your business") registration should only be done by applications, if they're done at all. >Saying "You need to upgrade to at least version Package A version >2.3.4 and Package B version 4.3 to use my code" is unlikely to happen; >you yourself still support Python 2.2 in your own packages. 2.3, actually, as it's the Python used by the most widely-available/supported Linux distros at the moment. > > anyway, and the user could, if he/she had to, use multiple > > inheritance plus some additional registrations of their own to > work things out. > >If there are two registrations for the same selection criteria, how >can the user resolve things? With an @around method, or by creating and using subclasses. >Either the first one registered wins, or >the second, or the user sees some sort of import failure, and can't >fix it without modifying somebody else's code to avoid one of those >registrations. AmbiguousMethods is a call-time error, not a definition time error, unless you are using custom combinators. L1 and L2 would have to define their own @lib1 and @lib2 combinators, and register them both with the same generic function *and* the same types, before you could get a definition-time error. And I could probably change the implementation to avoid this by always deferring method combination until the function is invoked at least once, but I'm not convinced it's worth it, especially since it could make other errors harder to find when writing combinators. In my experience, most combinators are defined by the library that defines the generic function using them, or else are general-purpose AOP-ish combinators like @before/@after/@around. From daniel at stutzbachenterprises.com Fri May 11 22:20:28 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Fri, 11 May 2007 15:20:28 -0500 Subject: [Python-3000] PEPs update In-Reply-To: <20070511122257.BIW67893@ms09.lnh.mail.rcn.net> References: <20070511122257.BIW67893@ms09.lnh.mail.rcn.net> Message-ID: On 5/11/07, Raymond Hettinger wrote: > Newly developed code always faces an uphill battle when compared to > mature open-source. As it should. :-) > End-users (everyday Python programmers) need to be understand the > performance intuitively and have a clear understanding of what is going > on under-the-hood. Our existing data structures have the virtue of having a > simple mental model (except for aspects of re-sizing and over-allocation which > are a bit obscure). I guess I have a different perspective. One advantage of the BList is that the user doesn't *need* to understand what's going on under-the-hood. They can rely on it to have good performance for any operation. One of my motivations in creating it was so I could be more lazy in the future. With a BList, I don't have to wonder whether Python code I write will ever be called with a really big list, and, if so, whether I need to rewrite my algorithm to avoid O(n^2) behavior. > That would likely be an informative exercise and would assure that your code > is truly interchangable with regular lists. It would also highlight the > under-the-hood difficulties you'll encounter with the C-API. > > That being said, it is a labor intensive exercise and the time might be better > spent on tweaking the third-party module code and building a happy user-base. I actually don't think it will be that bad, since list operations go through one thin API. I just need to redirect the API in listobject.h and I'm mostly done. I think. Maybe I'll take a quick pass at it, and if it turns into a nightmare, I'll reconsider. > > Indeed, I wrote the BList because there were idioms that I > > wanted to use that were just not practical with an array-based list. > > We ought to set up a page on the wiki for success stories with blist as a > third-party module. In time, the Right Answer (tm) will become self-evident. I haven't used the python.org wiki before. If you point me to the right place put a link to a BList page, I'd be happy to create one. Somewhere under UsefulModules? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From python at rcn.com Fri May 11 22:53:06 2007 From: python at rcn.com (Raymond Hettinger) Date: Fri, 11 May 2007 16:53:06 -0400 (EDT) Subject: [Python-3000] PEPs update Message-ID: <20070511165306.BIX50379@ms09.lnh.mail.rcn.net> > I haven't used the python.org wiki before. If you point me to the > right place put a link to a BList page, I'd be happy to create one. > Somewhere under UsefulModules? That would be a good place: http://wiki.python.org/moin/UsefulModules Raymond From benji at benjiyork.com Fri May 11 23:28:24 2007 From: benji at benjiyork.com (Benji York) Date: Fri, 11 May 2007 17:28:24 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070511163551.C04D83A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <46428622.7000204@canterbury.ac.nz> <20070510154540.8CEC23A4061@sparrow.telecommunity.com> <20070511020339.5F4313A4061@sparrow.telecommunity.com> <20070511163551.C04D83A4061@sparrow.telecommunity.com> Message-ID: <4644DFF8.8030609@benjiyork.com> Phillip J. Eby wrote: > At 12:05 AM 5/11/2007 -0400, Jim Jewett wrote: >> So we're mostly in agreement, but I had also wanted to leave out importString. >> >> I know it can seem simpler to treat everything as an object, and not >> worry about where the type switches from package to module to instance >> to attribute. I see it used in Twisted. >> >> But I'm not sure it is *really* simpler for someone who isn't familiar >> with your codebase, > > The use case is to be able to have a string that refers to an > importable object. The unittest module has something similar, egg > entry points do, and so does mod_python. (I wouldn't be surprised if > mod_wsgi has something like that also.) Chandler's repository > (object database) also had code to "load classes" by using a string > import, before I got there. zope.interface also allows "lazy" imports using string versions of module names in specific circumstances where circular dependencies are common. -- Benji York http://benjiyork.com From greg.ewing at canterbury.ac.nz Sat May 12 02:54:30 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 12 May 2007 12:54:30 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070511162803.5E70F3A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <20070511162803.5E70F3A4061@sparrow.telecommunity.com> Message-ID: <46451046.4000109@canterbury.ac.nz> Phillip J. Eby wrote: > you can register your types with other people's > generic functions, or other people's types with your generic > functions, There's still a possibility of conflict even then. Fred registers one of Mary's types with his generic function, which he feels entitled to do because he owns the function. Meanwhile, Mary registers the same type with the same function, which she feels entitled to do because she owns the type. The problem is that nobody entirely owns the (type, function) pair, which is what's required to be unique. -- Greg From rasky at develer.com Sat May 12 02:58:58 2007 From: rasky at develer.com (Giovanni Bajo) Date: Sat, 12 May 2007 02:58:58 +0200 Subject: [Python-3000] the future of the GIL In-Reply-To: <4642745C.1040702@canterbury.ac.nz> References: <463E4645.5000503@acm.org> <20070506222840.25B2.JCARLSON@uci.edu> <4642745C.1040702@canterbury.ac.nz> Message-ID: On 10/05/2007 3.24, Greg Ewing wrote: >> using multiple processes cause some >> headaches with frozen distributions (PyInstaller, py2exe, etc.), like those >> usually found on Windows, specifically because Windows does not have fork(). > > Isn't that just a problem with Windows generally? I don't > see what the method of packaging has to do with it. The processing module has two ways of creating a new process which executes the same program of the current process: - fork - the moral equivalent of popen(sys.executable sys.argv[0]) + some magic values passed on the command line which is a pickled state. The second method doesn't work out-of-the-box when the program is packaged, and it is the only one available in Windows. -- Giovanni Bajo Develer S.r.l. http://www.develer.com From rasky at develer.com Sat May 12 03:00:36 2007 From: rasky at develer.com (Giovanni Bajo) Date: Sat, 12 May 2007 03:00:36 +0200 Subject: [Python-3000] the future of the GIL In-Reply-To: <20070509203702.25EF.JCARLSON@uci.edu> References: <4642745C.1040702@canterbury.ac.nz> <20070509203702.25EF.JCARLSON@uci.edu> Message-ID: On 10/05/2007 5.38, Josiah Carlson wrote: >>> using multiple processes cause some >>> headaches with frozen distributions (PyInstaller, py2exe, etc.), like those >>> usually found on Windows, specifically because Windows does not have fork(). >> Isn't that just a problem with Windows generally? I don't >> see what the method of packaging has to do with it. >> >> Also, I've seen it suggested that there may actually be >> a way of doing something equivalent to a fork in Windows, >> even though it doesn't have a fork() system call as such. >> Does anyone know more about this? > > Cygwin emulates fork() by creating a shared mmap, creating a new child > process, copying the contents of the parent process' memory to the child > process (after performing the proper allocations), then hacks up the > child process' call stack. Yes that's the theory. If you look at the implementation, it's fullfilled of complexities, corner cases, undocumented glitchs and whatnot. cygwin's fork() is mature, but I don't think it's easy to extract from cygwin. Moreover, there would be license issues since fork() is GPL. Doing another implementation from scratch is going to be hard. -- Giovanni Bajo From benji at benjiyork.com Sat May 12 03:07:24 2007 From: benji at benjiyork.com (Benji York) Date: Fri, 11 May 2007 21:07:24 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <4644E9AB.6080603@trueblade.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <46428622.7000204@canterbury.ac.nz> <20070510154540.8CEC23A4061@sparrow.telecommunity.com> <20070511020339.5F4313A4061@sparrow.telecommunity.com> <20070511163551.C04D83A4061@sparrow.telecommunity.com> <4644DFF8.8030609@benjiyork.com> <4644E9AB.6080603@trueblade.com> Message-ID: <4645134C.2030509@benjiyork.com> Eric V. Smith wrote: > Benji York wrote: >> zope.interface also allows "lazy" imports using string versions of >> module names in specific circumstances where circular dependencies are >> common. > > Could you give an example of that? I'm familiar with zope.interface, > but not with this feature. I was mistaken, it's actually zope.dottedname that does this, which is then used by zope.app.container.constraints. My confusion stems from the fact that zope.app.container.constraints is often used when defining interfaces. My only reason for bringing it up was to reinforce the idea that it's a popular thing to reinvent, so either adding this to the stdlib or (preferably) creating a small, solid module as a stand-alone project. zope.dottedname documentation: http://svn.zope.org/zope.dottedname/trunk/src/zope/dottedname/resolve.txt?rev=75116&view=markup zope.app.container documentation: http://svn.zope.org/zope.app.container/trunk/src/zope/app/container/constraints.txt?rev=75262&view=markup -- Benji York http://benjiyork.com From greg.ewing at canterbury.ac.nz Sat May 12 03:16:45 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 12 May 2007 13:16:45 +1200 Subject: [Python-3000] PEPs update In-Reply-To: References: <20070511122257.BIW67893@ms09.lnh.mail.rcn.net> Message-ID: <4645157D.8050404@canterbury.ac.nz> Daniel Stutzbach wrote: > I actually don't think it will be that bad, since list operations go > through one thin API. I just need to redirect the API in listobject.h > and I'm mostly done. Some of that API consists of macros that index directly into the list. Currently those are O(1) and inlined. You would have to replace them with function calls that would be O(log n) and not inlined. The performance implications of that could be unpleasant. -- Greg From greg.ewing at canterbury.ac.nz Sat May 12 03:42:42 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 12 May 2007 13:42:42 +1200 Subject: [Python-3000] the future of the GIL In-Reply-To: References: <4642745C.1040702@canterbury.ac.nz> <20070509203702.25EF.JCARLSON@uci.edu> Message-ID: <46451B92.7010706@canterbury.ac.nz> Giovanni Bajo wrote: > cygwin's fork() is mature, but I don't think it's easy to extract from cygwin. > Moreover, there would be license issues since fork() is GPL. Doing another > implementation from scratch is going to be hard. Also it doesn't sound very efficient compared to a real unix fork, if it has to copy the whole address space instead of using copy-on-write. -- Greg From greg.ewing at canterbury.ac.nz Sat May 12 03:43:19 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 12 May 2007 13:43:19 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070511184927.7A2043A4061@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <20070511162803.5E70F3A4061@sparrow.telecommunity.com> <20070511184927.7A2043A4061@sparrow.telecommunity.com> Message-ID: <46451BB7.9030703@canterbury.ac.nz> Phillip J. Eby wrote: > At 01:27 PM 5/11/2007 -0400, Jim Jewett wrote: >>If there are two registrations for the same selection criteria, how >>can the user resolve things? But what if there's *already* an @around method being used? Then you need an @even_more_around method. Etc ad infinitum? -- Greg From ncoghlan at gmail.com Sat May 12 10:13:33 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 12 May 2007 18:13:33 +1000 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <46451046.4000109@canterbury.ac.nz> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <740c3aec0705091254n7a406621qb5cc491cf3af9743@mail.gmail.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <20070511162803.5E70F3A4061@sparrow.telecommunity.com> <46451046.4000109@canterbury.ac.nz> Message-ID: <4645772D.5070206@gmail.com> Greg Ewing wrote: > Phillip J. Eby wrote: >> you can register your types with other people's >> generic functions, or other people's types with your generic >> functions, > > There's still a possibility of conflict even then. Fred > registers one of Mary's types with his generic function, > which he feels entitled to do because he owns the function. > Meanwhile, Mary registers the same type with the same > function, which she feels entitled to do because she > owns the type. > > The problem is that nobody entirely owns the (type, > function) pair, which is what's required to be unique. However, even if it *does* happen, the application programmer can still resolve the conflict by picking one of the two implementations and registering it as an override. At the moment if you don't like the way a particularly library handles another library's or your application's types your ability to do anything about it is pretty close to nonexistent (unless the library employs some kind of interface or generic function mechanism). Generic functions don't magically make library compatibility problems go away, particularly when the libraries involved are interacting directly rather than going through the main application. What they *do* provide is a standard toolkit for reducing the likelihood of incompatibility occurring in the first place, and providing the means for resolving whatever conflicts do arise. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Sat May 12 01:47:03 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 11 May 2007 16:47:03 -0700 Subject: [Python-3000] Revised PEP 3119 (Abstract Base Classes) Message-ID: Here's a new version of the ABC PEP. A lot has changed; a lot remains. I can't give a detailed overview of all the changes, and a diff would show too many spurious changes, but some of the highlights are: - Overloading isinstance and issubclass is now a key mechanism rather than an afterthought; it is also the only change to C code required [12]. - No built-in types need to be modified, and @abstractmethod is once again imported from abc.py, which defines this and a new metaclass, ABCMeta. - Built-in (and user-defined) types can be registered as "virtual subclasses" (not related to virtual base classes in C++) of the standard ABCs, e.g. Sequence.register(tuple) makes issubclass(tuple, Sequence) true (but Sequence won't show up in __bases__ or __mro__). You can define your own ABCs and register standard ABCs or built-in types as their virtual subclasses. - The number of pre-defined ABCs is greatly reduced. Apart from the one-trick ponies, which are mostly unchanged, we now have: Set, MutableSet, Mapping, MutableMapping, Sequence, MutableSequence. That's it. Enjoy, PEP: 3119 Title: Introducing Abstract Base Classes Version: $Revision: 55276 $ Last-Modified: $Date: 2007-05-11 13:49:12 -0700 (Fri, 11 May 2007) $ Author: Guido van Rossum , Talin Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 18-Apr-2007 Post-History: 26-Apr-2007, 11-May-2007 Abstract ======== This is a proposal to add Abstract Base Class (ABC) support to Python 3000. It proposes: * A way to overload ``isinstance()`` and ``issubclass()``. * A new module ``abc`` which serves as an "ABC support framework". It defines a metaclass for use with ABCs and a decorator that can be used to define abstract methods. * Specific ABCs for containers and iterators, to be added to the collections module. Much of the thinking that went into the proposal is not about the specific mechanism of ABCs, as contrasted with Interfaces or Generic Functions (GFs), but about clarifying philosophical issues like "what makes a set", "what makes a mapping" and "what makes a sequence". There's also a companion PEP 3141, which defines ABCs for numeric types. Acknowledgements ---------------- Talin wrote the Rationale below [1]_ as well as most of the section on ABCs vs. Interfaces. For that alone he deserves co-authorship. The rest of the PEP uses "I" referring to the first author. Rationale ========= In the domain of object-oriented programming, the usage patterns for interacting with an object can be divided into two basic categories, which are 'invocation' and 'inspection'. Invocation means interacting with an object by invoking its methods. Usually this is combined with polymorphism, so that invoking a given method may run different code depending on the type of an object. Inspection means the ability for external code (outside of the object's methods) to examine the type or properties of that object, and make decisions on how to treat that object based on that information. Both usage patterns serve the same general end, which is to be able to support the processing of diverse and potentially novel objects in a uniform way, but at the same time allowing processing decisions to be customized for each different type of object. In classical OOP theory, invocation is the preferred usage pattern, and inspection is actively discouraged, being considered a relic of an earlier, procedural programming style. However, in practice this view is simply too dogmatic and inflexible, and leads to a kind of design rigidity that is very much at odds with the dynamic nature of a language like Python. In particular, there is often a need to process objects in a way that wasn't anticipated by the creator of the object class. It is not always the best solution to build in to every object methods that satisfy the needs of every possible user of that object. Moreover, there are many powerful dispatch philosophies that are in direct contrast to the classic OOP requirement of behavior being strictly encapsulated within an object, examples being rule or pattern-match driven logic. On the other hand, one of the criticisms of inspection by classic OOP theorists is the lack of formalisms and the ad hoc nature of what is being inspected. In a language such as Python, in which almost any aspect of an object can be reflected and directly accessed by external code, there are many different ways to test whether an object conforms to a particular protocol or not. For example, if asking 'is this object a mutable sequence container?', one can look for a base class of 'list', or one can look for a method named '__getitem__'. But note that although these tests may seem obvious, neither of them are correct, as one generates false negatives, and the other false positives. The generally agreed-upon remedy is to standardize the tests, and group them into a formal arrangement. This is most easily done by associating with each class a set of standard testable properties, either via the inheritance mechanism or some other means. Each test carries with it a set of promises: it contains a promise about the general behavior of the class, and a promise as to what other class methods will be available. This PEP proposes a particular strategy for organizing these tests known as Abstract Base Classes, or ABC. ABCs are simply Python classes that are added into an object's inheritance tree to signal certain features of that object to an external inspector. Tests are done using ``isinstance()``, and the presence of a particular ABC means that the test has passed. In addition, the ABCs define a minimal set of methods that establish the characteristic behavior of the type. Code that discriminates objects based on their ABC type can trust that those methods will always be present. Each of these methods are accompanied by an generalized abstract semantic definition that is described in the documentation for the ABC. These standard semantic definitions are not enforced, but are strongly recommended. Like all other things in Python, these promises are in the nature of a gentlemen's agreement, which in this case means that while the language does enforce some of the promises made in the ABC, it is up to the implementer of the concrete class to insure that the remaining ones are kept. Specification ============= The specification follows the categories listed in the abstract: * A way to overload ``isinstance()`` and ``issubclass()``. * A new module ``abc`` which serves as an "ABC support framework". It defines a metaclass for use with ABCs and a decorator that can be used to define abstract methods. * Specific ABCs for containers and iterators, to be added to the collections module. Overloading ``isinstance()`` and ``issubclass()`` ------------------------------------------------- During the development of this PEP and of its companion, PEP 3141, we repeatedly faced the choice between standardizing more, fine-grained ABCs or fewer, course-grained ones. For example, at one stage, PEP 3141 introduced the following stack of base classes used for complex numbers: MonoidUnderPlus, AdditiveGroup, Ring, Field, Complex (each derived from the previous). And the discussion mentioned several other algebraic categorizations that were left out: Algebraic, Transcendental, and IntegralDomain, and PrincipalIdealDomain. In earlier versions of the current PEP, we considered the use cases for separate classes like Set, ComposableSet, MutableSet, HashableSet, MutableComposableSet, HashableComposableSet. The dilemma here is that we'd rather have fewer ABCs, but then what should a user do who needs a less refined ABC? Consider e.g. the plight of a mathematician who wants to define his own kind of Transcendental numbers, but also wants float and int to be considered Transcendental. PEP 3141 originally proposed to patch float.__bases__ for that purpose, but there are some good reasons to keep the built-in types immutable (for one, they are shared between all Python interpreters running in the same address space, as is used by mod_python). Another example would be someone who wants to define a generic function (PEP 3124) for any sequences that has an ``append()`` method. The ``Sequence`` ABC (see below) doesn't promise the ``append()`` method, while ``MutableSequence`` requires not only ``append()`` but also various other mutating methods. To solve these and similar dilemmas, the next section will propose a metaclass for use with ABCs that will allow us to add an ABC as a "virtual base class" (not the same concept as in C++) to any class, including to another ABC. This allows the standard library to define ABCs ``Sequence`` and ``MutableSequence`` and register these as virtual base classes for built-in types like ``basestring``, ``tuple`` and ``list``, so that for example the following conditions are all true:: isinstance([], Sequence) issubclass(list, Sequence) issubclass(list, MutableSequence) isinstance((), Sequence) not issubclass(tuple, MutableSequence) isinstance("", Sequence) issubclass(bytes, MutableSequence) The primary mechanism proposed here is to allow overloading the built-in functions ``isinstance()`` and ``issubclass()``. The overloading works as follows: The call ``isinstance(x, C)`` first checks whether ``C.__instancecheck__`` exists, and if so, calls ``C.__instancecheck__(x)`` instead of its normal implementation. Similarly, the call ``issubclass(D, C)`` first checks whether ``C.__subclasscheck__`` exists, and if so, calls ``C.__subclasscheck__(D)`` instead of its normal implementation. Note that the magic names are not ``__isinstance__`` and ``__issubclass__``; this is because the reversal of the arguments could cause confusion, especially for the ``issubclass()`` overloader. A prototype implementation of this is given in [12]_. Here is an example with (naively simple) implementations of ``__instancecheck__`` and ``__subclasscheck__``:: class ABCMeta(type): def __instancecheck__(cls, inst): """Implement isinstance(inst, cls).""" return any(cls.__subclasscheck__(c) for c in {type(inst), inst.__class__}) def __subclasscheck__(cls, sub): """Implement issubclass(sub, cls).""" candidates = cls.__dict__.get("__subclass__", set()) | {cls} return any(c in candidates for c in sub.mro()) class Sequence(metaclass=ABCMeta): __subclass__ = {list, tuple} assert issubclass(list, Sequence) assert issubclass(tuple, Sequence) class AppendableSequence(Sequence): __subclass__ = {list} assert issubclass(list, AppendableSequence) assert isinstance([], AppendableSequence) assert not issubclass(tuple, AppendableSequence) assert not isinstance((), AppendableSequence) The next section proposes a full-fledged implementation. The ``abc`` Module: an ABC Support Framework -------------------------------------------- The new standard library module ``abc``, written in pure Python, serves as an ABC support framework. It defines a metaclass ``ABCMeta`` and a decorator ``@abstractmethod``. A sample implementation is given by [13]_. The ``ABCMeta`` class overrides ``__instancecheck__`` and ``__subclasscheck__`` and defines a ``register`` method. The ``register`` method takes one argument, which much be a class; after the call ``B.register(C)``, the call ``issubclass(C, B)`` will return True, by virtue of of ``B.__subclasscheck__(C)`` returning True. Also, ``isinstance(x, B)`` is equivalent to ``issubclass(x.__class__, B) or issubclass(type(x), B)``. (It is possible ``type(x)`` and ``x.__class__`` are not the same object, e.g. when x is a proxy object.) These methods are intended to be be called on classes whose metaclass is (derived from) ``ABCMeta``; for example:: from abc import ABCMeta class MyABC(metaclass=ABCMeta): pass MyABC.register(tuple) assert issubclass(tuple, MyABC) assert isinstance((), MyABC) The last two asserts are equivalent to the following two:: assert MyABC.__subclasscheck__(tuple) assert MyABC.__instancecheck__(()) Of course, you can also directly subclass MyABC:: class MyClass(MyABC): pass assert issubclass(MyClass, MyABC) assert isinstance(MyClass(), MyABC) Also, of course, a tuple is not a ``MyClass``:: assert not issubclass(tuple, MyClass) assert not isinstance((), MyClass) You can register another class as a subclass of ``MyClass``:: MyClass.register(list) assert issubclass(list, MyClass) assert issubclass(list, MyABC) You can also register another ABC:: class AnotherClass(metaclass=ABCMeta): pass AnotherClass.register(basestring) MyClass.register(AnotherClass) assert isinstance(str, MyABC) That last assert requires tracing the following superclass-subclass relationships:: MyABC -> MyClass (using regular subclassing) MyClass -> AnotherClass (using registration) AnotherClass -> basestring (using registration) basestring -> str (using regular subclassing) The ``abc`` module also defines a new decorator, ``@abstractmethod``, to be used to declare abstract methods. A class containing at least one method declared with this decorator that hasn't been overridden yet cannot be instantiated. Such a methods may be called from the overriding method in the subclass (using ``super`` or direct invocation). For example:: from abc import ABCMeta, abstractmethod class A(metaclass=ABCMeta): @abstractmethod def foo(self): pass A() # raises TypeError class B(A): pass B() # raises TypeError class C(A): def foo(self): print(42) C() # works **Note:** The ``@abstractmethod`` decorator should only be used inside a class body, and only for classes whose metaclass is (derived from) ``ABCMeta``. Dynamically adding abstract methods to a class, or attempting to modify the abstraction status of a method or class once it is created, are not supported. The ``@abstractmethod`` only affects subclasses derived using regular inheritance; "virtual subclasses" registered with the ``register()`` method are not affected. It has been suggested that we should also provide a way to define abstract data attributes. As it is easy to add these in a later stage, and as the use case is considerably less common (apart from pure documentation), we punt on this for now. **Implementation:** The ``@abstractmethod`` decorator sets the function attribute ``__isabstractmethod__`` to the value ``True``. The ``ABCMeta.__new__`` method computes the type attribute ``__abstractmethods__`` as the set of all method names that have an ``__isabstractmethod__`` attribute whose value is true. It does this by combining the ``__abstractmethods__`` attributes of the base classes, adding the names of all methods in the new class dict that have a true ``__isabstractmethod__`` attribute, and removing the names of all methods in the new class dict that don't have a true ``__isabstractmethod__`` attribute. If the resulting ``__abstractmethods__`` set is non-empty, the class is considered abstract, and attempts to instantiate it will raise ``TypeError``. (If this were implemented in CPython, an internal flag ``Py_TPFLAGS_ABSTRACT`` could be used to speed up this check [6]_.) **Discussion:** Unlike C++ or Java, abstract methods as defined here may have an implementation. This implementation can be called via the ``super`` mechanism from the class that overrides it. This could be useful as an end-point for a super-call in framework using a cooperative multiple-inheritance [7]_, [8]_. ABCs for Containers and Iterators --------------------------------- The ``collections`` module will define ABCs necessary and sufficient to work with sets, mappings, sequences, and some helper types such as iterators and dictionary views. All ABCs have the above-mentioned ``ABCMeta`` as their metaclass. The ABCs provide implementations of their abstract methods that are technically valid but fairly useless; e.g. ``__hash__`` returns 0, and ``__iter__`` returns an empty iterator. In general, the abstract methods represent the behavior of an empty container of the indicated type. Some ABCs also provide concrete (i.e. non-abstract) methods; for example, the ``Iterator`` class has an ``__iter__`` method returning itself, fulfilling an important invariant of iterators (which in Python 2 has to be implemented anew by each iterator class). These ABCs can be considered "mix-in" classes. No ABCs defined in the PEP override ``__init__``, ``__new__``, ``__str__`` or ``__repr__``. Defining a standard constructor signature would unnecessarily constrain custom container types, for example Patricia trees or gdbm files. Defining a specific string representation for a collection is similarly left up to individual implementations. **Note:** There are no ABCs for ordering operations (``__lt__``, ``__le__``, ``__ge__``, ``__gt__``). Defining these in a base class (abstract or not) runs into problems with the accepted type for the second operand. For example, if class ``Ordering`` defined ``__lt__``, one would assume that for any ``Ordering`` instances ``x`` and ``y``, ``x < y`` would be defined (even if it just defines a partial ordering). But this cannot be the case: If both ``list`` and ``str`` derived from ``Ordering``, this would imply that ``[1, 2] < (1, 2)`` should be defined (and presumably return False), while in fact (in Python 3000!) such "mixed-mode comparisons" operations are explicitly forbidden and raise ``TypeError``. See PEP 3100 and [14]_ for more information. (This is a special case of a more general issue with operations that take another argument of the same type: One Trick Ponies '''''''''''''''' These abstract classes represent single methods like ``__iter__`` or ``__len__``. ``Hashable`` The base class for classes defining ``__hash__``. The ``__hash__`` method should return an integer. The abstract ``__hash__`` method always returns 0, which is a valid (albeit inefficient) implementation. **Invariant:** If classes ``C1`` and ``C2`` both derive from ``Hashable``, the condition ``o1 == o2`` must imply ``hash(o1) == hash(o2)`` for all instances ``o1`` of ``C1`` and all instances ``o2`` of ``C2``. IOW, two objects should never compare equal but have different hash values. Another constraint is that hashable objects, once created, should never change their value (as compared by ``==``) or their hash value. If a class cannot guarantee this, it should not derive from ``Hashable``; if it cannot guarantee this for certain instances, ``__hash__`` for those instances should raise a ``TypeError`` exception. **Note:** being an instance of this class does not imply that an object is immutable; e.g. a tuple containing a list as a member is not immutable; its ``__hash__`` method raises ``TypeError``. (This is because it recursively tries to compute the hash of each member; if a member is unhashable it raises ``TypeError``.) ``Iterable`` The base class for classes defining ``__iter__``. The ``__iter__`` method should always return an instance of ``Iterator`` (see below). The abstract ``__iter__`` method returns an empty iterator. ``Iterator`` The base class for classes defining ``__next__``. This derives from ``Iterable``. The abstract ``__next__`` method raises ``StopIteration``. The concrete ``__iter__`` method returns ``self``. Note the distinction between ``Iterable`` and ``Iterator``: an ``Iterable`` can be iterated over, i.e. supports the ``__iter__`` methods; an ``Iterator`` is what the built-in function ``iter()`` returns, i.e. supports the ``__next__`` method. ``Sized`` The base class for classes defining ``__len__``. The ``__len__`` method should return an ``Integer`` (see "Numbers" below) >= 0. The abstract ``__len__`` method returns 0. **Invariant:** If a class ``C`` derives from ``Sized`` as well as from ``Iterable``, the invariant ``sum(1 for x in o) == len(o)`` should hold for any instance ``o`` of ``C``. ``Container`` The base class for classes defining ``__contains__``. The ``__contains__`` method should return a ``bool``. The abstract ``__contains__`` method returns ``False``. **Invariant:** If a class ``C`` derives from ``Container`` as well as from ``Iterable``, then ``(x in o for x in o)`` should be a generator yielding only True values for any instance ``o`` of ``C``. **Open issues:** Conceivably, instead of using the ABCMeta metaclass, these classes could override ``__instancecheck__`` and ``__subclasscheck__`` to check for the presence of the applicable special method; for example:: class Sized(metaclass=ABCMeta): @abstractmethod def __hash__(self): return 0 @classmethod def __instancecheck__(cls, x): return hasattr(x, "__len__") @classmethod def __subclasscheck__(cls, C): return hasattr(C, "__bases__") and hasattr(C, "__len__") This has the advantage of not requiring explicit registration. However, the semantics hard to get exactly right given the confusing semantics of instance attributes vs. class attributes, and that a class is an instance of its metaclass; the check for ``__bases__`` is only an approximation of the desired semantics. **Strawman:** Let's do it, but let's arrange it in such a way that the registration API also works. Sets '''' These abstract classes represent read-only sets and mutable sets. The most fundamental set operation is the membership test, written as ``x in s`` and implemented by ``s.__contains__(x)``. This operation is already defined by the `Container`` class defined above. Therefore, we define a set as a sized, iterable container for which certain invariants from mathematical set theory hold. The built-in type ``set`` derives from ``MutableSet``. The built-in type ``frozenset`` derives from ``Set`` and ``Hashable``. ``Set`` This is a sized, iterable container, i.e., a subclass of ``Sized``, ``Iterable`` and ``Container``. Not every subclass of those three classes is a set though! Sets have the additional invariant that each element occurs only once (as can be determined by iteration), and in addition sets define concrete operators that implement the inequality operations as subclass/superclass tests. In general, the invariants for finite sets in mathematics hold. [11]_ Sets with different implementations can be compared safely, (usually) efficiently and correctly using the mathematical definitions of the subclass/superclass operations for finite sets. The ordering operations have concrete implementations; subclasses may override these for speed but should maintain the semantics. Because ``Set`` derives from ``Sized``, ``__eq__`` may take a shortcut and returns ``False`` immediately if two sets of unequal length are compared. Similarly, ``__le__`` may return ``False`` immediately if the first set has more members than the second set. Note that set inclusion implements only a partial ordering; e.g. ``{1, 2}`` and ``{1, 3}`` are not ordered (all three of ``<``, ``==`` and ``>`` return ``False`` for these arguments). Sets cannot be ordered relative to mappings or sequences, but they can be compared to those for equality (and then they always compare unequal). This class also defines concrete operators to compute union, intersection, symmetric and asymmetric difference, respectively ``__or__``, ``__and__``, ``__xor__`` and ``__sub__``. These operators should return instances of ``Set``. The default implementations call the overridable class method ``_from_iterable()`` with an iterable argument. This factory method's default implementation returns a ``frozenset`` instance; it may be overridden to return another appropriate ``Set`` subclass. Finally, this class defines a concrete method ``_hash`` which computes the hash value from the elements. Hashable subclasses of ``Set`` can implement ``__hash__`` by calling ``_hash`` or they can reimplement the same algorithm more efficiently; but the algorithm implemented should be the same. Currently the algorithm is fully specified only by the source code [15]_. **Note:** the ``issubset`` and ``issuperset`` methods found on the set type in Python 2 are not supported, as these are mostly just aliases for ``__le__`` and ``__ge__``. ``MutableSet`` This is a subclass of ``Set`` implementing additional operations to add and remove elements. The supported methods have the semantics known from the ``set`` type in Python 2 (except for ``discard``, which is modeled after Java): ``.add(x)`` Abstract method returning a ``bool`` that adds the element ``x`` if it isn't already in the set. It should return ``True`` if ``x`` was added, ``False`` if it was already there. The abstract implementation raises ``NotImplementedError``. ``.discard(x)`` Abstract method returning a ``bool`` that removes the element ``x`` if present. It should return ``True`` if the element was present and ``False`` if it wasn't. The abstract implementation raises ``NotImplementedError``. ``.pop()`` Concrete method that removes and returns an arbitrary item. If the set is empty, it raises ``KeyError``. The default implementation removes the first item returned by the set's iterator. ``.toggle(x)`` Concrete method returning a ``bool`` that adds x to the set if it wasn't there, but removes it if it was there. It should return ``True`` if ``x`` was added, ``False`` if it was removed. ``.clear()`` Concrete method that empties the set. The default implementation repeatedly calls ``self.pop()`` until ``KeyError`` is caught. (**Note:** this is likely much slower than simply creating a new set, even if an implementation overrides it with a faster approach; but in some cases object identity is important.) This also supports the in-place mutating operations ``|=``, ``&=``, ``^=``, ``-=``. These are concrete methods whose right operand can be an arbitrary ``Iterable``, except for ``&=``, whose right operand must be a ``Container``. This ABC does not support the named methods present on the built-in concrete ``set`` type that perform (almost) the same operations. Mappings '''''''' These abstract classes represent read-only mappings and mutable mappings. The ``Mapping`` class represents the most common read-only mapping API. The built-in type ``dict`` derives from ``MutableMapping``. ``Mapping`` A subclass of ``Container``, ``Iterable`` and ``Sized``. The keys of a mapping naturally form a set. The (key, value) pairs (which must be tuples) are also referred to as items. The items also form a set. Methods: ``.__getitem__(key)`` Abstract method that returns the value corresponding to ``key``, or raises ``KeyError``. The implementation always raises ``KeyError``. ``.get(key, default=None)`` Concrete method returning ``self[key]`` if this does not raise ``KeyError``, and the ``default`` value if it does. ``.__contains__(key)`` Concrete method returning ``True`` if ``self[key]`` does not raise ``KeyError``, and ``False`` if it does. ``.__len__()`` Abstract method returning the number of distinct keys (i.e., the length of the key set). ``.__iter__()`` Abstract method returning each key in the key set exactly once. ``.keys()`` Concrete method returning the key set as a ``Set``. The default concrete implementation returns a "view" on the key set (meaning if the underlying mapping is modified, the view's value changes correspondingly); subclasses are not required to return a view but they should return a ``Set``. ``.items()`` Concrete method returning the items as a ``Set``. The default concrete implementation returns a "view" on the item set; subclasses are not required to return a view but they should return a ``Set``. ``.values()`` Concrete method returning the values as a sized, iterable container (not a set!). The default concrete implementation returns a "view" on the values of the mapping; subclasses are not required to return a view but they should return a sized, iterable container. The following invariants should hold for any mapping ``m``:: len(m.values()) == len(m.keys()) == len(m.items()) == len(m) [value for value in m.values()] == [m[key] for key in m.keys()] [item for item in m.items()] == [(key, m[key]) for key in m.keys()] i.e. iterating over the items, keys and values should return results in the same order. ``MutableMapping`` A subclass of ``Mapping`` that also implements some standard mutating methods. Abstract methods include ``__setitem__``, ``__delitem__``. Concrete methods include ``pop``, ``popitem``, ``clear``, ``update``. **Note:** ``setdefault`` is *not* included. **Open issues:** Write out the specs for the methods. Sequences ''''''''' These abstract classes represent read-only sequences and mutable sequences. The built-in ``list`` and ``bytes`` types derive from ``MutableSequence``. The built-in ``tuple`` and ``str`` types derive from ``Sequence`` and ``Hashable``. ``Sequence`` A subclass of ``Iterable``, ``Sized``, ``Container``. It defines a new abstract method ``__getitem__`` that has a somewhat complicated signature: when called with an integer, it returns an element of the sequence or raises ``IndexError``; when called with a ``slice`` object, it returns another ``Sequence``. The concrete ``__iter__`` method iterates over the elements using ``__getitem__`` with integer arguments 0, 1, and so on, until ``IndexError`` is raised. The length should be equal to the number of values returned by the iterator. **Open issues:** Other candidate methods, which can all have default concrete implementations that only depend on ``__len__`` and ``__getitem__`` with an integer argument: ``__reversed__``, ``index``, ``count``, ``__add__``, ``__mul__``. ``MutableSequence`` A subclass of ``Sequence`` adding some standard mutating methods. Abstract mutating methods: ``__setitem__`` (for integer indices as well as slices), ``__delitem__`` (ditto), ``insert``, ``append``, ``reverse``. Concrete mutating methods: ``extend``, ``pop``, ``remove``. Concrete mutating operators: ``+=``, ``*=`` (these mutate the object in place). **Note:** this does not define ``sort()`` -- that is only required to exist on genuine ``list`` instances. Strings ------- Python 3000 will likely have at least two built-in string types: byte strings (``bytes``), deriving from ``MutableSequence``, and (Unicode) character strings (``str``), deriving from ``Sequence`` and ``Hashable``. **Open issues:** define the base interfaces for these so alternative implementations and subclasses know what they are in for. This may be the subject of a new PEP or PEPs (PEP 358 should be co-opted for the ``bytes`` type). ABCs vs. Alternatives ===================== In this section I will attempt to compare and contrast ABCs to other approaches that have been proposed. ABCs vs. Duck Typing -------------------- Does the introduction of ABCs mean the end of Duck Typing? I don't think so. Python will not require that a class derives from ``BasicMapping`` or ``Sequence`` when it defines a ``__getitem__`` method, nor will the ``x[y]`` syntax require that ``x`` is an instance of either ABC. You will still be able to assign any "file-like" object to ``sys.stdout``, as long as it has a ``write`` method. Of course, there will be some carrots to encourage users to derive from the appropriate base classes; these vary from default implementations for certain functionality to an improved ability to distinguish between mappings and sequences. But there are no sticks. If ``hasattr(x, __len__)`` works for you, great! ABCs are intended to solve problems that don't have a good solution at all in Python 2, such as distinguishing between mappings and sequences. ABCs vs. Generic Functions -------------------------- ABCs are compatible with Generic Functions (GFs). For example, my own Generic Functions implementation [4]_ uses the classes (types) of the arguments as the dispatch key, allowing derived classes to override base classes. Since (from Python's perspective) ABCs are quite ordinary classes, using an ABC in the default implementation for a GF can be quite appropriate. For example, if I have an overloaded ``prettyprint`` function, it would make total sense to define pretty-printing of sets like this:: @prettyprint.register(Set) def pp_set(s): return "{" + ... + "}" # Details left as an exercise and implementations for specific subclasses of Set could be added easily. I believe ABCs also won't present any problems for RuleDispatch, Phillip Eby's GF implementation in PEAK [5]_. Of course, GF proponents might claim that GFs (and concrete, or implementation, classes) are all you need. But even they will not deny the usefulness of inheritance; and one can easily consider the ABCs proposed in this PEP as optional implementation base classes; there is no requirement that all user-defined mappings derive from ``BasicMapping``. ABCs vs. Interfaces ------------------- ABCs are not intrinsically incompatible with Interfaces, but there is considerable overlap. For now, I'll leave it to proponents of Interfaces to explain why Interfaces are better. I expect that much of the work that went into e.g. defining the various shades of "mapping-ness" and the nomenclature could easily be adapted for a proposal to use Interfaces instead of ABCs. "Interfaces" in this context refers to a set of proposals for additional metadata elements attached to a class which are not part of the regular class hierarchy, but do allow for certain types of inheritance testing. Such metadata would be designed, at least in some proposals, so as to be easily mutable by an application, allowing application writers to override the normal classification of an object. The drawback to this idea of attaching mutable metadata to a class is that classes are shared state, and mutating them may lead to conflicts of intent. Additionally, the need to override the classification of an object can be done more cleanly using generic functions: In the simplest case, one can define a "category membership" generic function that simply returns False in the base implementation, and then provide overrides that return True for any classes of interest. References ========== .. [1] An Introduction to ABC's, by Talin (http://mail.python.org/pipermail/python-3000/2007-April/006614.html) .. [2] Incomplete implementation prototype, by GvR (http://svn.python.org/view/sandbox/trunk/abc/) .. [3] Possible Python 3K Class Tree?, wiki page created by Bill Janssen (http://wiki.python.org/moin/AbstractBaseClasses) .. [4] Generic Functions implementation, by GvR (http://svn.python.org/view/sandbox/trunk/overload/) .. [5] Charming Python: Scaling a new PEAK, by David Mertz (http://www-128.ibm.com/developerworks/library/l-cppeak2/) .. [6] Implementation of @abstractmethod (http://python.org/sf/1706989) .. [7] Unifying types and classes in Python 2.2, by GvR (http://www.python.org/download/releases/2.2.3/descrintro/) .. [8] Putting Metaclasses to Work: A New Dimension in Object-Oriented Programming, by Ira R. Forman and Scott H. Danforth (http://www.amazon.com/gp/product/0201433052) .. [9] Partial order, in Wikipedia (http://en.wikipedia.org/wiki/Partial_order) .. [10] Total order, in Wikipedia (http://en.wikipedia.org/wiki/Total_order) .. [11] Finite set, in Wikipedia (http://en.wikipedia.org/wiki/Finite_set) .. [12] Make isinstance/issubclass overloadable (http://python.org/sf/1708353) .. [13] ABCMeta sample implementation (http://svn.python.org/view/sandbox/trunk/abc/xyz.py) .. [14] python-dev email ("Comparing heterogeneous types") http://mail.python.org/pipermail/python-dev/2004-June/045111.html .. [15] Function ``frozenset_hash()`` in Object/setobject.c (http://svn.python.org/view/python/trunk/Objects/setobject.c) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -- --Guido van Rossum (home page: http://www.python.org/~guido/) From gproux+py3000 at gmail.com Sat May 12 17:27:09 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Sun, 13 May 2007 00:27:09 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> Message-ID: <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> Dear all, Pleased to meet you. I just subscribed to the list because I wanted to join the discussion regarding a specific PEP (for all the rest, you are all much more expert than me) Guido: > 3131 (non-ASCII identifiers) -- I'm leaning towards rejecting. I would like to voice my opposition to the rejection at that stage and request that more time is spent requesting/analysing the opinion of more people especially the people who have to deal with non-roman languages as a daily basis and especially people in the education field (along like other interesting people like the OLPC people) French but living in Japanese and essentially trilingual, I have experience in localization/internationalization (as an i18n engineer for Symbian Ltd.), and very ardent Python supporter, I have tried (and sometimes managed) teaching Python to a number of younger or less young people, male and female in both French and Japanese environment. In this respect, I strongly believe that support non-ASCII identifiers as proposed by PEP3131 would improve a number of things: - discussion and uptake of python in "non-ascii" countries - ability for children to learn programming in their own language (I started programming at 7 years old and would have been very disturbed if I could not use my own language to type in programs) - increase of the number of new "interesting" packages from non-ascii countries - ability for local programmers and local companies to provide "bridges" between international (english) APIs and local APIs. - Increase the number of python users (from 7 to 77 years old) In my humble opinion, now that UTF8 is accepted as the standard source code encoding, it is very difficult to understand why we should start putting restrictions on the kind of identifiers that are used (which would force people to comment line by line as they do now!). When I am programming in Python, I am VERY DISTURBED when the code I write contains much comment. It needs to be readable just by glancing at it. However, for most of the people who are core python developers, you should ask what is the typical reading speed for "ascii" characters for a e.g. standard Japanese pupil. You would be very surprised how slow that is. In my opinion (after leaving in Japan for quite a bit), people are very slow to read ASCII characters and this definitely restrain their programming productivity and expressiveness. Of course, for things like "standard libraries", I think that self-regulation and project based regulation will impose ASCII charsets for the base libraries and APIs but i really believe that letting people use their own charset to express themself will REALLY give them the productivity boost they would deserve from python. Let me know if you have any question. Regards, Guillaume From steven.bethard at gmail.com Sat May 12 19:03:45 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sat, 12 May 2007 11:03:45 -0600 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <1178551661.8251.16.camel@antoine-ubuntu> References: <1178551661.8251.16.camel@antoine-ubuntu> Message-ID: On 5/7/07, Antoine Pitrou wrote: > FWIW and in light of the thread on removing __del__ from the language, I > just posted Yet Another Recipe for automatic finalization: > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/519621 > > It allows writing a finalizer as a single __finalize__ method, at the > cost of explicitly calling an enable_finalizer() method with the list of > attributes to keep alive on the "ghost object". And here's a version that doesn't lose updates to the finalizer attributes: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/519635 It replaces enable_finalizer() with a class attribute __finalattrs__. >From __finalize__, all class attributes and methods are accessible, as are any instance attributes specified by __finalattrs__. Guido's BufferedWriter example looks like:: class BufferedWriter(Finalized): __finalattrs__ = 'buffer', 'raw' ... def flush(self): self.raw.write(self.buffer) self.buffer = b"" def __finalize__(self): self.flush() STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From eucci.group at gmail.com Sat May 12 19:58:10 2007 From: eucci.group at gmail.com (Jeff Shell) Date: Sat, 12 May 2007 11:58:10 -0600 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <20070510153507.205EB3A4061@sparrow.telecommunity.com> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> <46424718.20006@benjiyork.com> <20070510012836.16D0D3A4061@sparrow.telecommunity.com> <464318D0.2000109@benjiyork.com> <20070510153507.205EB3A4061@sparrow.telecommunity.com> Message-ID: <88d0d31b0705121058n7a92af7dn178f220dba922a91@mail.gmail.com> On 5/10/07, Phillip J. Eby wrote: > At 09:06 AM 5/10/2007 -0400, Benji York wrote: > >I would let Jim speak for himself too, but I prefer to put words in his > >mouth. ;) While zope.interface has anemic facilities for "verifying" > >interfaces, few people use them, and even then rarely outside of very > >simple "does this object look right" when testing. It may have been > >believed verification would be a great thing, but it's all but > >deprecated at this point. > > Okay, but that's quite the opposite of what I understand Jeff to be > saying in this thread, which is that not only is LBYL good, but that > he does it all the time. Actually, I don't know what LBYL and EFTP (or whatever that other one is) mean in this context. This is the first time I've heard, or at least paid attention to, those acronyms. In this context anyways. If you could explain what this really means, and KTATAM (Keep The Acronyms To A Minimum), I would appreciate it. I recognize the arguments I've made seem to go behind LBYL, but that was mostly chosen because that's what you said zope.interface did or was. And gul-darnit, I like zope.interface. > >My main intent in piping up was > >dispelling the LBYL dispersions about zope.interface. ;) > > Well, "back in the day", before PyProtocols was written, I discovered > PEP 246 adaptation and began trying to convince Jim Fulton that > adaptation beat the pants off of using if-then's to do "implements" > testing. His argument then, IIRC, was that interface verification > was more important. I then went off and wrote PyProtocols in large > part (specifically the large documentation part!) to show him what > could be done using adaptation as a core concept. I think it's beneficial to have both. But I agree, it's usually better to program against adaptation. It provides more flexibility. I think the 'hasattr()' syndrome still hangs over many of us, however. We're used to looking at the piece of the duck we're interested in more than trying to see if we can put something into a duck suit (or better yet - Duck Soup!) But the 'provides / provided by' piece is still important to me. Adaptation isn't *always* needed or useful. I like that the interface hierarchy is different than an implementation hierarchy. I like that it's easier to test for interface provision than it is to use isinstance() - `IFoo.providedBy(obj)` often works regardless of whether 'obj' is a proxy or wrapper, and without tampering with `isinstance()`. I know that there's been talk of having ``__isinstance()__`` and ``__issubclass()__``, which could be used to take care of the proxy/wrapper problem. But I haven't formed an opinion about how I feel about that. I like the Roles/Traits side of zope.interface because I can declare that information about third party products. For example, I was able to add some 'implements' directives to a SQLAlchemy 'InstrumentedList' class - basically I said that it supported the common `ISequence` interface. Which I recognize that in this particular scenario, if that role/trait or abstract base class was built in, than I wouldn't have had to do that (since it is based on a common Python type spec). Still though - it doesn't matter whether `InstrumentedList` derives from `list` or `UserList` or implements the entire sequence API directly. The Trait could be assigned independent of implementation, and could be done in another product without affecting any internals of SQLAlchemy: I didn't have to make a subclass that SQLAlchemy wouldn't know to instantiate. I didn't have to write an adapter. I just had to say "I happen to know that instances of this class will have this trait". I don't know if that's LBYL, EYV (Eat Your Vegetables), LBWBCTS (Look Both Ways Before Crossing The Street), or what. I think it's just a way of saying "I happen to know that this thing smells like a duck. It doesn't say that it smells like a duck, but I know it smells like a duck. And for everywhere that I expect to find the fine fragrance of duck, this thing should be allowed." No adapters, no changing the base classes, no meddling with method resolution order, just adding a trait. The trait in this case is like an access pass - just an extra thing worn around the neck that will grant you access to certain doors and pathways. It doesn't change who you are or how you accomplish your job. MTAAFT (Maybe There's Another Acronym For This)? -- Jeff Shell From pje at telecommunity.com Sat May 12 19:55:12 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 12 May 2007 13:55:12 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <46451BB7.9030703@canterbury.ac.nz> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070509205655.622A63A4061@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <20070511162803.5E70F3A4061@sparrow.telecommunity.com> <20070511184927.7A2043A4061@sparrow.telecommunity.com> <46451BB7.9030703@canterbury.ac.nz> Message-ID: <20070512180213.EEDC93A4088@sparrow.telecommunity.com> At 01:43 PM 5/12/2007 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > > At 01:27 PM 5/11/2007 -0400, Jim Jewett wrote: > >>If there are two registrations for the same selection criteria, how > >>can the user resolve things? > >But what if there's *already* an @around method being >used? Then you need an @even_more_around method. Etc >ad infinitum? Yep, so simple things are simple, and complex things are possible. That's the Python Way(tm). :) To put your comment in another perspective, "but what if somebody already defined a method in their class that I want to change? Then I need to subclass it. And if somebody wants to change that they have to subclass *that*, etc. ad infinitum? Clearly classes are too complicated!" :) In practice, @around is mostly used for application-defined special cases, and there is no higher authority than the application who needs to override things. If a library needs special combinators internally, it's better off making them lower-than- at around precedence. Normal, before, and after methods are usually adequate for libraries. (Aside from special-purpose combinators like the @discount example.) From stargaming at gmail.com Sat May 12 20:12:38 2007 From: stargaming at gmail.com (Stargaming) Date: Sat, 12 May 2007 20:12:38 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> Message-ID: Guillaume Proux schrieb: > Dear all, > [snip] > > In this respect, I strongly believe that support non-ASCII identifiers > as proposed by PEP3131 would improve a number of things: > - discussion and uptake of python in "non-ascii" countries While still separating them from ascii-countries. They would start writing programs that expose foreign-phrased APIs but we would deny using them because we couldn't even type a single word! > - ability for children to learn programming in their own language (I > started programming at 7 years old and would have been very disturbed > if I could not use my own language to type in programs) AFAIK, allowing non-ascii identifiers would still *not* translate python. They would still have to struggle with every part of python that is builtin, i.e. builtins (you could let non-ascii identifiers reference them, though) and keywords. Better come up with some proposal to translate python (perhaps PyPy could do something here?) or all python-scripts (I think a translator could do its job here) to improve the situation. > - increase of the number of new "interesting" packages from non-ascii countries As stated above, we could not use them though. Bad deal, if you ask me! > - ability for local programmers and local companies to provide > "bridges" between international (english) APIs and local APIs. I don't get the improvement offered by this one. We should *allow* non-ascii identifiers to **require** wrappers? > - Increase the number of python users (from 7 to 77 years old) Works in English, too. > > In my humble opinion, now that UTF8 is accepted as the standard source > code encoding, it is very difficult to understand why we should start > putting restrictions on the kind of identifiers that are used (which > would force people to comment line by line as they do now!). No, we do not restrict them, we simply do not allow them (what is a huge difference here). UTF-8 will be allowed (*and* enforced by default) as a file encoding, i.e. strings and comments will be affected. I don't see the real restriction here. Correct me please, if I'm wrong. > > When I am programming in Python, I am VERY DISTURBED when the code I > write contains much comment. It needs to be readable just by glancing > at it. OTOH, I cannot glance at japanese code and know what it means. So, better the japanese developer named it badly but explained it than requiring me to consult a dictionary. > > However, for most of the people who are core python developers, you > should ask what is the typical reading speed for "ascii" characters > for a e.g. standard Japanese pupil. You would be very surprised how > slow that is. In my opinion (after leaving in Japan for quite a bit), > people are very slow to read ASCII characters and this definitely > restrain their programming productivity and expressiveness. See above, at least *my* reading speed for japanese text tends to zero (if not less!). > > Of course, for things like "standard libraries", I think that > self-regulation and project based regulation will impose ASCII > charsets for the base libraries and APIs but i really believe that > letting people use their own charset to express themself will REALLY > give them the productivity boost they would deserve from python. They're free to express their thoughts in comments, today, still separating them from ascii-developers. > > Let me know if you have any question. > > Regards, > > Guillaume I do not think allowing people to program in *their* language would enhance integration. It would just split the python community *even* more. I like communicating with non-native English speakers much more than not communicating with them at all because they got their own language in there. Additionally, I think the reason for rejection of this PEP is the same one that applied to all those "Let the user extend Python's grammar at runtime" -- one developer would have to learn a completely new language for understanding a program. To communicate, we just have to find (or agree on) a common point between devs. Python is English, that's a matter of fact IMO. It is the common language that makes us a community and *one* language. I'm, well, -1 on this (even though I don't know if I got a voice here). -- Greetings, Stargaming From pje at telecommunity.com Sat May 12 20:29:17 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 12 May 2007 14:29:17 -0400 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <88d0d31b0705121058n7a92af7dn178f220dba922a91@mail.gmail.co m> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> <46424718.20006@benjiyork.com> <20070510012836.16D0D3A4061@sparrow.telecommunity.com> <464318D0.2000109@benjiyork.com> <20070510153507.205EB3A4061@sparrow.telecommunity.com> <88d0d31b0705121058n7a92af7dn178f220dba922a91@mail.gmail.com> Message-ID: <20070512182731.977973A4088@sparrow.telecommunity.com> At 11:58 AM 5/12/2007 -0600, Jeff Shell wrote: >Actually, I don't know what LBYL and EFTP (or whatever that other one >is) mean in this context. This is the first time I've heard, or at >least paid attention to, those acronyms. In this context anyways. > >If you could explain what this really means, and KTATAM (Keep The >Acronyms To A Minimum), I would appreciate it. I recognize the >arguments I've made seem to go behind LBYL, but that was mostly chosen >because that's what you said zope.interface did or was. And >gul-darnit, I like zope.interface. Checking whether an object provides an interface is LBYL. Simply proceeding as if it does (or adapting it to a desired interface), is EAFP. zope.interface can certainly be used in either style, but when it was first created, LBYL is *all* it did. Adaptation was added later. > > Well, "back in the day", before PyProtocols was written, I discovered > > PEP 246 adaptation and began trying to convince Jim Fulton that > > adaptation beat the pants off of using if-then's to do "implements" > > testing. His argument then, IIRC, was that interface verification > > was more important. I then went off and wrote PyProtocols in large > > part (specifically the large documentation part!) to show him what > > could be done using adaptation as a core concept. > >I think it's beneficial to have both. But I agree, it's usually better >to program against adaptation. It provides more flexibility. I think >the 'hasattr()' syndrome still hangs over many of us, however. We're >used to looking at the piece of the duck we're interested in more than >trying to see if we can put something into a duck suit (or better yet >- Duck Soup!) > >But the 'provides / provided by' piece is still important to me. >Adaptation isn't *always* needed or useful. That's actually an illusion created by the economic impact of using interfaces and adapters instead of generic functions. In languages with generic functions, nobody bothers creating separate "trait" systems, apart from designating groups of GFs that "go together". (Haskell typeclasses and Dylan modules, for example), because GFs are so easy and elementary that it seems like part of the normal development flow. Interfaces+Adaptation are such a clumsy way of doing the same thing, that it often seems easier to get by *checking* for an existing interface, instead of defining a new one and adapting to it. (Or just using an overload.) But in the early days of PyProtocols, I soon realized that checking for an interface was *always* an antipattern, no matter how temptingly convenient it might appear to be to rationalize an interface check at the time. You can get away with it sometimes... but never for long, if your code is being reused. >I don't know if that's LBYL, EYV (Eat Your Vegetables), LBWBCTS (Look >Both Ways Before Crossing The Street), or what. I think it's just a >way of saying "I happen to know that this thing smells like a duck. It >doesn't say that it smells like a duck, but I know it smells like a >duck. And for everywhere that I expect to find the fine fragrance of >duck, this thing should be allowed." Note that this is still one level of abstraction away from your goal: to get some behavior. Instead of checking for duckness or quackability, *just peform the "quack" operation*. If you want to know about quackability because you intend to do something *else* with the object, then just do that "something else". The point of generic functions is that the only reason it's worth knowing something about a "trait" is to select *how* you will accomplish something. So just accomplish the something, instead of micromanaging. Remember the bad old days before OO? The big step forward was to get rid of all those switch/cases in your functions, replacing them with method dispatching. The second big step forward is to get rid of the type/hasattr/interface/role/trait testing, and replace it with generic functions. From guido at python.org Sat May 12 20:53:58 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 12 May 2007 11:53:58 -0700 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: References: <1178551661.8251.16.camel@antoine-ubuntu> Message-ID: On 5/12/07, Steven Bethard wrote: > And here's a version that doesn't lose updates to the finalizer attributes: > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/519635 > > It replaces enable_finalizer() with a class attribute __finalattrs__. > >From __finalize__, all class attributes and methods are accessible, as > are any instance attributes specified by __finalattrs__. Guido's > BufferedWriter example looks like:: > > class BufferedWriter(Finalized): > __finalattrs__ = 'buffer', 'raw' > ... > def flush(self): > self.raw.write(self.buffer) > self.buffer = b"" > > def __finalize__(self): > self.flush() But can I subclass it and in the subclass override (extend) flush()? E.g. class MyWriter(BufferedWriter): def flush(self): super(MyWriter, self).flush() # Or super.flush() once PEP xxx is accepted print("Feel free to unplug the disk now") -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Sat May 12 21:03:05 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 12 May 2007 15:03:05 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070512180213.EEDC93A4088@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <20070511162803.5E70F3A4061@sparrow.telecommunity.com> <20070511184927.7A2043A4061@sparrow.telecommunity.com> <46451BB7.9030703@canterbury.ac.nz> <20070512180213.EEDC93A4088@sparrow.telecommunity.com> Message-ID: On 5/12/07, Phillip J. Eby wrote: > At 01:43 PM 5/12/2007 +1200, Greg Ewing wrote: > In practice, @around is mostly used for application-defined special > cases, and there is no higher authority than the application who > needs to override things. If a library needs special combinators > internally, it's better off making them lower-than- at around > precedence. Normal, before, and after methods are usually adequate > for libraries. (Aside from special-purpose combinators like the > @discount example.) (1) Would it be reaonable to say this in the PEP? (2) Would it be reasonable to leave out (or at least, leave for another PEP) the extension methods like discount? -jJ From talin at acm.org Sat May 12 21:07:24 2007 From: talin at acm.org (Talin) Date: Sat, 12 May 2007 12:07:24 -0700 Subject: [Python-3000] Revised PEP 3119 (Abstract Base Classes) In-Reply-To: References: Message-ID: <4646106C.1090608@acm.org> Guido van Rossum wrote: > Here's a new version of the ABC PEP. A lot has changed; a lot remains. > I can't give a detailed overview of all the changes, and a diff would > show too many spurious changes, but some of the highlights are: Some general comments on the PEP: Compared to the previous version, this version of the PEP is closer in spirit to the various other competing proposals for 'post-hoc object taxonomies', although some important differences remain. I'd like to point out both the similarities and the differences, especially the latter as they form the basis for further discussion and possibly evolution. First, the ways in which the new PEP more closely resembles its competitors: The new version of the PEP is more strongly oriented towards post-hoc classification of objects, in other words, putting classes into categories that may not have existed when the classes were created. It also means that there is no longer a requirement that categories for built-in objects have an official Python "seal of approval". Anyone can come along and re-categorize the built-ins however they like; And they can do so in a way that doesn't interfere with any previously existing categories. There will of course be certain 'standard' categories (as outlined in the PEP), but these standard categories do not have any privileged status, unlike the ones in the earlier versions of the PEP. It means that if we make a mistake defining the categories (or more likely, if we fail to address someone's needs), it is possible for someone else to come along and repair that mistake by defining a competing taxonomy. The categorization relationships are now stored preferentially in a map which is external to the objects being categorized, allowing objects to be recategorized without mutating them. This is similar to the behavior of Colin Winter's 'roles' proposal and some others. (For the remainder of this document, I am going to use the term "dynamic inheritance" to describe specifying inheritance via Guido's special methods, as opposed to "traditional inheritance", what we have now.) Now, on to the differences: The key differentiator between Guido's proposal and the others can be summarized by the following question: "Should the mechanism which defines the hierarchy of classes be the same as the mechanism that defines the hierarchy of categories?" To put it another way, is a "category" (or "interface" or "role" or whatever term you want to use) a "class" in the normal sense, or is it some other thing? In the terminology of Java and C# and other languages which support interfaces, the term 'interface' is explicitly defined as something that is 'not a class'. A class is a unit of implementation, and interfaces contain no implementation. [1] In these object classification systems, there are three different relationships we care about: -- The normal inheritance relationship between classes. -- The specification of which classes belong to which categories. -- The relationship between the categories themselves. (Note that in some systems, such as Raymond Hettinger's attribute-based proposal, the third type of relationship doesn't exist - each category is standalone, although you can simulate the effects of a category hierarchy by putting objects in multiple categories. Thus, there's no MutableSequence category, but you can place an object in both Mutable and Sequence and infer from there.) Given these different types of relationships, the question to be asked is, should all of these various things use the same mechanism and the same testing predicate (isinstance), or should they be separate mechanisms? I'll try to summarize some of the pros and cons, although this should not be considered a comprehensive list: Arguments in favor of reusing 'isinstance': -- It's familiar and easy to remember. -- Not everyone considers interfaces and implementations to be distinct things, at least not in Python where there are no clear boundaries enforced by the language (as can be seen in Guido's desire to have some partial implementation in the ABCs.) -- Declaring overloads in PJE's generic function proposal is cleaner if we only have to worry about annotating arguments for types rather than types + interfaces. In other words, we would need two different kinds of annotations for a given method signature, and a way to discriminate between them. If categories are just base classes, then we only have one dispatch type to worry about. [2] Arguments in favor of a different mechanism: -- Mixing different kinds of inheritance mechanisms within a single object might lead to some strange inconsistencies. For example, if you have two classes, one which derives from an ABC using traditional inheritance, and one which derives using dynamic inheritance, they may behave differently. (For example, the @abstractmethod decorator only affects classes that derive from the ABC using traditional inheritance, not dynamic inheritance. Some folks my find this inconsistency objectionable.) -- For some people, an interface is not the same thing as a class, and should not be treated as such. In particular, there is a desire by some people to enforce a stricter separation between interface and implementation. -- Forcing them to be separate allows you to make certain simplifying assumptions about the class hierarchy. If categories can relate to each other via traditional inheritance, and if I want to trace upwards from a given class to find all interfaces that it implements, then I may have to trace both traditional and dynamic inheritance links. If categories can only relate via some special scheme, however, then I can simply do my tracing in two passes: First find all base classes using traditional inheritance, and then given that set, find all categories using dynamic inheritance. In other words, I don't have to keep switching inheritance types as I trace. --- [1] On the other hand, both C# and Java allow interfaces to be tested by their equivalent of "isinstance" so there is some conflation of the two. On the gripping hand, however, C# and Java are both statically typed language, where things like "isinstance" really means "istype", whereas in Python "isinstance" really means something more like "isimplementation". So there is no exact equivalent to what Python does here. [2] I should mention that one of my personal criteria for evaluating these proposals is the level of synergy achieved with PJE's PEP. Now, PJE may claim that he doesn't need interfaces or ABCs or anything, but I believe that his PEP benefits considerably by the existence of ABCs, because it means that you need far fewer overloads in an ABC world. Thus, I can overload based on "Sequence" rather than having to have separate overloads for list, tuple, and various user-created sequence types. (Although I can if I really need to.) I would go further, and say that these object taxonomies should only go so far as to provide what is needed to obtain that synergy; Any features beyond that are mostly superfluous. But that's just my personal opinion. -- Talin From talin at acm.org Sat May 12 21:09:52 2007 From: talin at acm.org (Talin) Date: Sat, 12 May 2007 12:09:52 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <20070511162803.5E70F3A4061@sparrow.telecommunity.com> <20070511184927.7A2043A4061@sparrow.telecommunity.com> <46451BB7.9030703@canterbury.ac.nz> <20070512180213.EEDC93A4088@sparrow.telecommunity.com> Message-ID: <46461100.9080609@acm.org> Jim Jewett wrote: > On 5/12/07, Phillip J. Eby wrote: >> At 01:43 PM 5/12/2007 +1200, Greg Ewing wrote: > >> In practice, @around is mostly used for application-defined special >> cases, and there is no higher authority than the application who >> needs to override things. If a library needs special combinators >> internally, it's better off making them lower-than- at around >> precedence. Normal, before, and after methods are usually adequate >> for libraries. (Aside from special-purpose combinators like the >> @discount example.) > > (1) Would it be reaonable to say this in the PEP? > > (2) Would it be reasonable to leave out (or at least, leave for > another PEP) the extension methods like discount? There ought to be a way to preserve with each PEP a separate document containing a more lengthy discussion of the rationales and consequences. Similar to the way that the _Federalist Papers_ is often used to interpret the meaning of the U.S. Constitution. > -jJ > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org From guido at python.org Sat May 12 21:19:58 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 12 May 2007 12:19:58 -0700 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <88d0d31b0705121058n7a92af7dn178f220dba922a91@mail.gmail.com> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> <46424718.20006@benjiyork.com> <20070510012836.16D0D3A4061@sparrow.telecommunity.com> <464318D0.2000109@benjiyork.com> <20070510153507.205EB3A4061@sparrow.telecommunity.com> <88d0d31b0705121058n7a92af7dn178f220dba922a91@mail.gmail.com> Message-ID: On 5/12/07, Jeff Shell wrote: > I like that the interface hierarchy is different than an > implementation hierarchy. I like that it's easier to test for > interface provision than it is to use isinstance() - > `IFoo.providedBy(obj)` often works regardless of whether 'obj' is a > proxy or wrapper, and without tampering with `isinstance()`. I know > that there's been talk of having ``__isinstance()__`` and > ``__issubclass()__``, which could be used to take care of the > proxy/wrapper problem. But I haven't formed an opinion about how I > feel about that. > > I like the Roles/Traits side of zope.interface because I can declare > that information about third party products. For example, I was able > to add some 'implements' directives to a SQLAlchemy 'InstrumentedList' > class - basically I said that it supported the common `ISequence` > interface. Which I recognize that in this particular scenario, if that > role/trait or abstract base class was built in, than I wouldn't have > had to do that (since it is based on a common Python type spec). Still > though - it doesn't matter whether `InstrumentedList` derives from > `list` or `UserList` or implements the entire sequence API directly. > The Trait could be assigned independent of implementation, and could > be done in another product without affecting any internals of > SQLAlchemy: I didn't have to make a subclass that SQLAlchemy wouldn't > know to instantiate. I didn't have to write an adapter. I just had to > say "I happen to know that instances of this class will have this > trait". > > I don't know if that's LBYL, EYV (Eat Your Vegetables), LBWBCTS (Look > Both Ways Before Crossing The Street), or what. I think it's just a > way of saying "I happen to know that this thing smells like a duck. It > doesn't say that it smells like a duck, but I know it smells like a > duck. And for everywhere that I expect to find the fine fragrance of > duck, this thing should be allowed." No adapters, no changing the base > classes, no meddling with method resolution order, just adding a > trait. The trait in this case is like an access pass - just an extra > thing worn around the neck that will grant you access to certain doors > and pathways. It doesn't change who you are or how you accomplish your > job. Please have a look at the latest version (updated yesterday) of PEP 3119. Using the classes there, you can say from collections import Sequence Sequence.register(InstrumentedList) >From this point, issubclass(InstrumentedList, Sequence) will be true (and likewise for instances of it and isinstance(x, Sequence)). But InstrumentedList's __mro__ and __bases__ are unchanged. This is pretty close to what you expect from a Zope interface, except you can also subclass Sequence if you want to, and a later version of SQLAlchemy could subclass InstrumentedList from Sequence. (The register() call would then be redundant, but you won't have to remove it -- it will act as a no-op if the given subclass relationship already holds.) A subclass of Sequence can behave either as an implementation class (when it provides implementations of all required methods) or as another interface (if it adds one or more new abstract method). You can think of Sequence and its brethren as mix-ins -- they provide some default implementations of certain methods, and abstract definitions of others (the "essential" ones; e.g. Sequence makes __len__ and __getitem__ abstract but __iter__ has a concrete default implementation). Phillip has told me that this is transparent to his GF machinery -- if you overload a GF on Sequence, and InstrumentedList is a subclass Sequence (whether through registration or subclassing), then that version of the GF will be used for InstrumentedList (unless there's a more specific overloaded version of course). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Sat May 12 21:26:10 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sat, 12 May 2007 15:26:10 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <20070511162803.5E70F3A4061@sparrow.telecommunity.com> <20070511184927.7A2043A4061@sparrow.telecommunity.com> <46451BB7.9030703@canterbury.ac.nz> <20070512180213.EEDC93A4088@sparrow.telecommunity.com> Message-ID: <20070512192425.84E3C3A4088@sparrow.telecommunity.com> At 03:03 PM 5/12/2007 -0400, Jim Jewett wrote: >On 5/12/07, Phillip J. Eby wrote: >>At 01:43 PM 5/12/2007 +1200, Greg Ewing wrote: > >>In practice, @around is mostly used for application-defined special >>cases, and there is no higher authority than the application who >>needs to override things. If a library needs special combinators >>internally, it's better off making them lower-than- at around >>precedence. Normal, before, and after methods are usually adequate >>for libraries. (Aside from special-purpose combinators like the >>@discount example.) > >(1) Would it be reaonable to say this in the PEP? Sure. >(2) Would it be reasonable to leave out (or at least, leave for >another PEP) the extension methods like discount? The emerging consensus appears to be that everything relating to method combination and Aspects should be a second PEP, much like the Python 2.2 type system overhaul was separated into a mro/metaclass-oriented PEP and a descriptor-oriented PEP, even though the two were quite interrelated. So, examples for custom method combination, as well as best-practices for the standard combinators' uses would reasonably both go in the method-combination-and-aspects PEP. From steven.bethard at gmail.com Sat May 12 21:28:54 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sat, 12 May 2007 13:28:54 -0600 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: References: <1178551661.8251.16.camel@antoine-ubuntu> Message-ID: On 5/12/07, Guido van Rossum wrote: > On 5/12/07, Steven Bethard wrote: > > And here's a version that doesn't lose updates to the finalizer attributes: > > > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/519635 > > > > It replaces enable_finalizer() with a class attribute __finalattrs__. > > >From __finalize__, all class attributes and methods are accessible, as > > are any instance attributes specified by __finalattrs__. Guido's > > BufferedWriter example looks like:: > > > > class BufferedWriter(Finalized): > > __finalattrs__ = 'buffer', 'raw' > > ... > > def flush(self): > > self.raw.write(self.buffer) > > self.buffer = b"" > > > > def __finalize__(self): > > self.flush() > > But can I subclass it and in the subclass override (extend) flush()? E.g. > > class MyWriter(BufferedWriter): > def flush(self): > super(MyWriter, self).flush() # Or super.flush() once PEP xxx is accepted > print("Feel free to unplug the disk now") Yep. The 'self' passed to __finalize__ is still an instance of the same class (e.g. BufferedWriter or MyWriter). So inheritance works normally: >>> class BufferedWriter(Finalized): ... __finalattrs__ = 'buffer', 'raw' ... def __init__(self): ... self.buffer = '' ... self.raw = 'raw' ... def flush(self): ... print 'writing:', self.buffer, 'to', self.raw ... self.buffer = '' ... def __finalize__(self): ... self.flush() ... >>> class MyWriter(BufferedWriter): ... def flush(self): ... super(MyWriter, self).flush() ... print 'feel free to unplug the disk now' ... >>> w = MyWriter() >>> del w writing: to raw feel free to unplug the disk now STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Sat May 12 22:03:59 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 12 May 2007 13:03:59 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070512192425.84E3C3A4088@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <20070511162803.5E70F3A4061@sparrow.telecommunity.com> <20070511184927.7A2043A4061@sparrow.telecommunity.com> <46451BB7.9030703@canterbury.ac.nz> <20070512180213.EEDC93A4088@sparrow.telecommunity.com> <20070512192425.84E3C3A4088@sparrow.telecommunity.com> Message-ID: On 5/12/07, Phillip J. Eby wrote: > The emerging consensus appears to be that everything relating to > method combination and Aspects should be a second PEP, [...] Yes, please. I've just finished reading linearly through the version of PEP 3124 that's currently online, and the farther I got into the method combining and Aspects section, the stronger the feeling I had that there's just too much stuff here, and that it's all quite esoteric. Some other feedback on the PEP (I will be awaiting your split-up version before commenting in detail): - Please supply a References section, linking to clear explanations (and sometimes source code) of the various systems you mention (e.g. Haskell typeclasses, CLOS, AspectJ, but also PEAK, RuleDispatch and so on). Even in this age of search engines you owe your reader this service. (And in the past I've had a helluva time finding things like those "656 lines" in peak.rules.core!) Every time you mention a concept that I don't know very well without a reference, I feel a little stupider, and less favorably inclined towards the PEP. I imagine that's not just my response; nobody likes reading something that makes them feel stupid. - Please provide motivating use cases beyond the toy examples for each proposed feature. I am really glad that you have toy examples, because they help tremendously to understand how a feature works. But I am often stuck with the question "why would I need this"? - Expect pushback on your assumption that every function or method should be fair game for overloading. Requiring explicit tagging the base or default implementation makes things a lot more palatable and predictable for those of us who are still struggling to accept GFs. - Some of the examples of method overloading in classes look really hard to follow and easy to get wrong if one deviates from the cookbook examples. (Though this may be limited to the "advanced" PEP.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Sun May 13 03:40:11 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 13 May 2007 13:40:11 +1200 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <20070509015553.9C6843A4061@sparrow.telecommunity.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> <46424718.20006@benjiyork.com> <20070510012836.16D0D3A4061@sparrow.telecommunity.com> <464318D0.2000109@benjiyork.com> <20070510153507.205EB3A4061@sparrow.telecommunity.com> <88d0d31b0705121058n7a92af7dn178f220dba922a91@mail.gmail.com> Message-ID: <46466C7B.5010006@canterbury.ac.nz> Guido van Rossum wrote: > From this point, issubclass(InstrumentedList, Sequence) will be true > (and likewise for instances of it and isinstance(x, Sequence)). But > InstrumentedList's __mro__ and __bases__ are unchanged. I think I've figured out what bothers me about this kind of overloading of isinstance(). Normally if isinstance(x, C) is true, we expect that a method call on x can at least potentially invoke a method of class C. But if isinstance(x, C) can be true even if C doesn't appear in the mro of x, this is no longer the case. This isn't so much of a worry when x is acting as a proxy for C, since it's probably forwarding method calls to a real instance of C somewhere. But using it in a more general way seems strange. -- Greg From greg.ewing at canterbury.ac.nz Sun May 13 03:52:42 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 13 May 2007 13:52:42 +1200 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: References: <1178551661.8251.16.camel@antoine-ubuntu> Message-ID: <46466F6A.30302@canterbury.ac.nz> Steven Bethard wrote: > Yep. The 'self' passed to __finalize__ is still an instance of the > same class (e.g. BufferedWriter or MyWriter). So inheritance works > normally: However, if the overridden method uses any attributes not mentioned in the original __finalattrs__, they will need to be added to it somehow. It might be useful if the metaclass gathered up the contents of __finalattr__ from the class and all its base classes. Then a class could just list its own needed attributes without having to worry about those needed by its base classes. This also suggests that some care will be needed when overriding methods of a class that uses this recipe. You need to know whether the method can be called from the finalizer, so you can be sure to include the appropriate attributes in __finalattrs__. -- Greg From steven.bethard at gmail.com Sun May 13 04:18:09 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sat, 12 May 2007 20:18:09 -0600 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: <46466F6A.30302@canterbury.ac.nz> References: <1178551661.8251.16.camel@antoine-ubuntu> <46466F6A.30302@canterbury.ac.nz> Message-ID: On 5/12/07, Greg Ewing wrote: > Steven Bethard wrote: > > > Yep. The 'self' passed to __finalize__ is still an instance of the > > same class (e.g. BufferedWriter or MyWriter). So inheritance works > > normally: > > However, if the overridden method uses any attributes > not mentioned in the original __finalattrs__, they > will need to be added to it somehow. > > It might be useful if the metaclass gathered up the > contents of __finalattr__ from the class and all its > base classes. Then a class could just list its > own needed attributes without having to worry about > those needed by its base classes. You already don't need to list the attributes from the base classes. The __finalattrs__ are converted into class level descriptors, so if class D inherits from class C, it has the __finalattrs__ descriptors for both classes. Did you try it and find that it didn't work? STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From gproux+py3000 at gmail.com Sun May 13 05:50:30 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Sun, 13 May 2007 12:50:30 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> Message-ID: <19dd68ba0705122050n45bec072s831c72270cf04bf4@mail.gmail.com> Dear Stargaming, On 5/13/07, Stargaming wrote: > Guillaume Proux schrieb: I see that the language you are most comfortable with is German. Compared with French (and even more with Japanese), I have a bias that German people are very gifted in foreign languages and especially in English... > While still separating them from ascii-countries. They would start > writing programs that expose foreign-phrased APIs but we would deny > using them because we couldn't even type a single word! If I rephrase your sentence above to use the local (e.g. Japanese) view: "People in ascii countries, are writing programs that expose foreign phrased APIs but we are denied using them because we cannot even read a single word" The situation right now that each community in "non-ascii" countries is rather small because they are denied writing ANY program at all. Acceptance of 3131 would enable the following things: 1) new contributors (including the younger who is not necessarily able to deal with English) will start programming (in python) 2) some of them will want to join the international community 3) more programs, both in local and international communities > AFAIK, allowing non-ascii identifiers would still *not* translate > python. They would still have to struggle with every part of python that > is builtin, i.e. builtins (you could let non-ascii identifiers reference You answer misses the point about ability for children to learn programming early. In France, my experience was that it is very important to let children use their own native vocabulary. Let me tell you about two things we did in France re. computer science teaching to young children around the 1980s which had a great influence on me and other children of that type: 1) LOGO programming: we would be able to use the turtle using simple words in French. That made a big impression on children. We would spend hours playing with this. Now people just grab a Nintendo DS and never approach computers with a "programming" approach at that age. 2) Robots: we had a robotic arm that came with a custom programming interface (in French). It was very challenging to optimize your programs to achieve a given task (take the ball and drop it in the glass) in the minimum of time and steps. Without the ability to do all that in French at the age of 8 or 9 years old, we would probably have not enjoyed that as much. > As stated above, we could not use them though. Bad deal, if you ask me! They can't use what we give them here. Who is loser in the deal right now. Once again, we should not deny each language to create their own package ecosystem. I believe really *good* packages will always end up having an i18n-ized version. > I don't get the improvement offered by this one. We should *allow* > non-ascii identifiers to **require** wrappers? You are always taking the wrong side of the equation. By allowing non-ascii chars in the mix, standard APIs will be able to be offered in each local languages, once again, for the local good. > > - Increase the number of python users (from 7 to 77 years old) > Works in English, too. Do you know many Japanese/Chinese young children or elderly people that are only speaker/reader/writer of their own language? Try to get them to speak to them in English just for fun or worse, make them read python code and ask them to explain you what it means. > No, we do not restrict them, we simply do not allow them (what is a huge > difference here). UTF-8 will be allowed (*and* enforced by default) as a Not allowing something which now becomes naturally possible is *not* a restriction? > file encoding, i.e. strings and comments will be affected. I don't see > the real restriction here. Correct me please, if I'm wrong. Imagine you would be born in a world where your alphabet is hardly ever used in the computing world. I am sure you would have a much harder time learning programming. > OTOH, I cannot glance at japanese code and know what it means. So, > better the japanese developer named it badly but explained it than > requiring me to consult a dictionary. I am talking about your own code, the code you might need to maintain for years. Once again, you are looking at your own small world where it is "easy" for you to *write* programs if only because it uses the character set in which you have been dwelling since you were born. > See above, at least *my* reading speed for japanese text tends to zero > (if not less!). And this is not the issue. Of course, in the future of accepted PEP3131, there is some scripts which you won't be able to read. And that is fine, because it will probably some internally developped program in a large Japanese company. I am a strong believer that self regulation will happen for new packages that could be interesting for the international python community (remember that we are talking about new packages that would never see the light of day without PEP3131) > They're free to express their thoughts in comments, today, still > separating them from ascii-developers. You were shrugging off ealier the fact that it is not important for people to understand their code by glancing at it. And now dismiss their concern with, "Good enough for those guys to just do line by line commenting". How nice. > I do not think allowing people to program in *their* language would > enhance integration. It would just split the python community *even* > more. I like communicating with non-native English speakers much more My point is that you cannot split a community... that does not even exist *yet* because the entry barrier to the community is too high for too many people in non-ascii countries. I am taking the long-term view. Getting people involved with Python today when they are 7-8 years old and in 10 years we will have strong community members of non-ascii countries. > To communicate, we just have to find (or agree on) a common point > between devs. Python is English, that's a matter of fact IMO. It is the > common language that makes us a community and *one* language. Yes, and I don't think that PEP3131 will change anything to that fact, but for each local community we should allow people to use their own language (mostly as users). The fact that I have seen NO comment on this issue from non-ascii devs also definitely makes me think that the community is not reaching far enough in those countries that are not using latin characters and that PEP3131 will help providing new blood from these countries. I hope Guido will be able to see the long-term benefits of accepting this PEP. Best Regards, Guillaume From guido at python.org Sun May 13 05:59:18 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 12 May 2007 20:59:18 -0700 Subject: [Python-3000] ABC's, Roles, etc In-Reply-To: <46466C7B.5010006@canterbury.ac.nz> References: <88d0d31b0705081452w3082c1d0x7355e5eed7a3e093@mail.gmail.com> <88d0d31b0705082057l75ae6241gffa42545c25937d@mail.gmail.com> <20070509173942.B5A1B3A4061@sparrow.telecommunity.com> <46424718.20006@benjiyork.com> <20070510012836.16D0D3A4061@sparrow.telecommunity.com> <464318D0.2000109@benjiyork.com> <20070510153507.205EB3A4061@sparrow.telecommunity.com> <88d0d31b0705121058n7a92af7dn178f220dba922a91@mail.gmail.com> <46466C7B.5010006@canterbury.ac.nz> Message-ID: On 5/12/07, Greg Ewing wrote: > Guido van Rossum wrote: > > > From this point, issubclass(InstrumentedList, Sequence) will be true > > (and likewise for instances of it and isinstance(x, Sequence)). But > > InstrumentedList's __mro__ and __bases__ are unchanged. > > I think I've figured out what bothers me about this > kind of overloading of isinstance(). Normally if > isinstance(x, C) is true, we expect that a method > call on x can at least potentially invoke a method > of class C. But if isinstance(x, C) can be true > even if C doesn't appear in the mro of x, this is > no longer the case. Well, not if x.__class__ overrides all of C's methods. And this registration business we're *supposed* to register only classes that provide concrete implementation of all of C's methods, which comes down to the same thing. Also, I'm unclear under what circumstances knowing that would make a difference in your understanding of a program? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Sun May 13 06:45:57 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 13 May 2007 16:45:57 +1200 Subject: [Python-3000] PEP: Eliminate __del__ In-Reply-To: References: <1178551661.8251.16.camel@antoine-ubuntu> <46466F6A.30302@canterbury.ac.nz> Message-ID: <46469805.3000005@canterbury.ac.nz> Steven Bethard wrote: > You already don't need to list the attributes from the base classes. > The __finalattrs__ are converted into class level descriptors, so if > class D inherits from class C, it has the __finalattrs__ descriptors > for both classes. That's fine, then. -- Greg From talin at acm.org Sun May 13 07:36:10 2007 From: talin at acm.org (Talin) Date: Sat, 12 May 2007 22:36:10 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> Message-ID: <4646A3CA.40705@acm.org> Guillaume Proux wrote: > Dear all, > > Pleased to meet you. I just subscribed to the list because I wanted to > join the discussion regarding a specific PEP (for all the rest, you > are all much more expert than me) > > Guido: >> 3131 (non-ASCII identifiers) -- I'm leaning towards rejecting. > > I would like to voice my opposition to the rejection at that stage and > request that more time is spent requesting/analysing the opinion of > more people especially the people who have to deal with non-roman > languages as a daily basis and especially people in the education > field (along like other interesting people like the OLPC people) One point that was raised by Alex Martelli is that the full set of Unicode 'letter' characters includes many characters which are visually indistinguishable in every font in the world. It means that, from now on, when I look at the variable named 'a', I can no longer be sure that what I am looking at is really the character I think it is. It means that we have introduced, for every Python programmer, a level of uncertainty that wasn't there before. Programming languages are supposed to represent a compromise between the capabilities of humans and the capabilities of computers - in other words, both humans and computers are supposed to meet each other half-way and find a "sweet spot" that represents a restricted set of commands and symbols that both can understand. Each of them is expected to put a certain amount of effort into learning this common language - in the case of the computer, that effort is embodied into the design of the compiler, and in the case of the human, that effort is the learning of a formal dialect of commands and codes. However, there is another use of programming languages, which is for programmers to communicate with each other. Specifically, the programming language provides a concise, unambiguous way to describe a particular algorithm or technique. Again, there is the expectation that practitioners of this discipline are expected to put a certain amount of effort into learning this formal language so that they can communicate with each other precisely. The fact that programming languages resemble a particular human language is a pedagogical convenience, but it need not be so, and wasn't always that way. And the fact that a pidgin form of English words and grammar is used for most programming languages is a frozen accident, just as English is also the language used for international air traffic control. However, I think it is a mistake to think that programs themselves are written in "English", they are written in a formal language, similar to the language of mathematics, which every programmer needs to put in a modest amount of effort to learn, even native English speakers. In any case, I would argue that if you teach someone to program in a dialect that cannot be understood by the global community of programmers, then you haven't really taught them 'programming' at all - you've taught them a kind of applied logic that they might be able to use personally, but that is only a small part of the craft of software engineering. The greater part is the ability to understand the vast corpus of literature out there that explains how to do just about everything you can think of with these tools. Only through learning a common language can they participate in the global technical infrastructure, which is more and more what I believe 'programming' is about. There is another issue to be considered as well: Many human languages have a different grammatical structure. Even if you were to allow non-ASCII identifiers, and more so even if you were to allow the keywords themselves to be localized, you still have the problem that 'if' comes at the start of a sentence, which makes no sense in many languages. -- Talin From hanser at club-internet.fr Sun May 13 08:56:33 2007 From: hanser at club-internet.fr (Pierre Hanser) Date: Sun, 13 May 2007 08:56:33 +0200 Subject: [Python-3000] Support for PEP 3131 Message-ID: <4646B6A1.7060007@club-internet.fr> hello i would like to add that even for a french user, it would be good to have the possibility to use non ascii identifiers. take the simple problem of the difference between "action to do" and "action done". In english, most of the time, adding 'ed' to the verb will do the difference: change -> changed great, this is still ascii! in french: change -> chang? (ends with 'eacute') bad, not ascii! The possibility to use non ascii characters for identifiers would be a strong bonus in my opinion. -- Pierre From martin at v.loewis.de Sun May 13 13:55:26 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 13 May 2007 13:55:26 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4646A3CA.40705@acm.org> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> Message-ID: <4646FCAE.7090804@v.loewis.de> > One point that was raised by Alex Martelli is that the full set of > Unicode 'letter' characters includes many characters which are visually > indistinguishable in every font in the world. It means that, from now > on, when I look at the variable named 'a', I can no longer be sure that > what I am looking at is really the character I think it is. It means > that we have introduced, for every Python programmer, a level of > uncertainty that wasn't there before. That's a red herring. This problem is unlikely to occur in practice. There are other, more serious cases of presentation ambiguity (e.g. tabs vs. spaces), yet nobody suggests to ban tabs from the language for that reason. > However, I think it is a mistake to think that programs themselves are > written in "English", they are written in a formal language, similar to > the language of mathematics, which every programmer needs to put in a > modest amount of effort to learn, even native English speakers. While that is true from the computer's point of view, it is not so for many people writing programs. They want to understand programs "naturally", not "mathematically". If it was only for the mathematical properties, we could restrict ourselves to identifiers _1, _2, _3, and so on. > In any case, I would argue that if you teach someone to program in a > dialect that cannot be understood by the global community of > programmers, then you haven't really taught them 'programming' at all - > you've taught them a kind of applied logic that they might be able to > use personally, but that is only a small part of the craft of software > engineering. What is the relationship to this PEP? (and in the paragraphs that I snipped?) Following your long text of reasoning - are you now in favor or opposed to the language change proposed in PEP 3131? > There is another issue to be considered as well: Many human languages > have a different grammatical structure. Even if you were to allow > non-ASCII identifiers, and more so even if you were to allow the > keywords themselves to be localized, you still have the problem that > 'if' comes at the start of a sentence, which makes no sense in many > languages. The PEP doesn't propose to adjust the grammar of Python to the grammar of any natural language. All it proposes is to extend the language to allow additional characters in identifiers. Regards, Martin From martin at v.loewis.de Sun May 13 15:11:48 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 13 May 2007 15:11:48 +0200 Subject: [Python-3000] the future of the GIL In-Reply-To: <46428EED.3060205@canterbury.ac.nz> References: <463E4645.5000503@acm.org> <20070506222840.25B2.JCARLSON@uci.edu> <4642745C.1040702@canterbury.ac.nz> <3d2ce8cb0705091839w7b4fec56ud6a1ed9cb0ad264d@mail.gmail.com> <46428EED.3060205@canterbury.ac.nz> Message-ID: <46470E94.6040307@v.loewis.de> > If so, it looks like it might be possible to give > Python a fork() that works on Windows, at least for > the time being. It's quite a challenge. That call just creates a process, and not thread. You need to invoke many more API calls to make the process actually run. For some reason (which I couldn't figure out), Cygwin abstained from using that in their implementation of fork. Regards, Martin From jason.orendorff at gmail.com Sun May 13 15:39:22 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Sun, 13 May 2007 09:39:22 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4646A3CA.40705@acm.org> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> Message-ID: On 5/13/07, Talin wrote: > The fact that programming languages resemble a particular human language > is a pedagogical convenience, but it need not be so, and wasn't always > that way. "Crucial usability feature", not "pedagogical convenience". Choosing good names for things is an important skill in all modern programming languages. It always involves a natural language, and great skill in that natural language pays off. Look--this whole discussion has lacked perspective. Non-ASCII identifiers are not going to cause confusion or "split the community even more". Java and XML have not suffered. People writing code for open distribution will stick to ASCII in practice. There is no problem. But for the same reason, the benefit isn't all that great either. Python should allow foreign-language identifiers because (1) it's a gesture of good will to people everywhere who don't speak English fluently; (2) some students will benefit; (3) some people writing code that no one else will ever see will benefit. Non-English-language tutorials might also benefit. (?) I think the gesture alone is worth it, even if no one ever used the feature productively. But people will. The cost to python-dev is low, and the cost to English-speaking users is very likely zero. What am I missing? -j From gproux+py3000 at gmail.com Sun May 13 16:23:56 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Sun, 13 May 2007 23:23:56 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> Message-ID: <19dd68ba0705130723r793e034ax46319b86e166ac5b@mail.gmail.com> Hi Jason, Very interesting post. I will just make a little comment. On 5/13/07, Jason Orendorff wrote: > Python should allow foreign-language identifiers because (1) it's a > gesture of good will to people everywhere who don't speak English > fluently; (2) some students will benefit; (3) some people writing code > that no one else will ever see will benefit. I would change your (1) slightly to make it clear that it is NOT a fluency in English vs. other language issue. What this is about is really the ability to efficiently read/write Python programs when it is usually a challenge to read (but also write) latin characters. Once again, you would be surprised how challenging it is for e.g. most Japanese people to decipher text written in latin characters. > I think the gesture alone is worth it, even if no one ever used the > feature productively. But people will. The cost to python-dev is low, > and the cost to English-speaking users is very likely zero. ASCII limitations have disappeared everywhere in the usage of most modern OSes because the historical memory limitations have no more meaning. It is great to see that Python could become the first language to *really* enter the 21st century. Regards, Guillaume From martin at v.loewis.de Sun May 13 16:30:00 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 13 May 2007 16:30:00 +0200 Subject: [Python-3000] PEP 3123 (Was: PEP Parade) In-Reply-To: References: Message-ID: <464720E8.3040402@v.loewis.de> > S 3123 Making PyObject_HEAD conform to standard C von L?wis > > I like it, but who's going to make the changes? Once those chnges have > been made, will it still be reasonable to expect to merge C code from > the (2.6) trunk into the 3.0 branch? I just created bugs.python.org/1718153, which implements this PEP. I had to add a number of additional macros (Py_Refcnt, Py_Size, PyVarObject_HEAD_INIT); using these macros throughout is the bulk of the change. If the macros are backported to 2.x (omitting the "hard" changes to PyObject itself), then the code base can stay the same between 2.x and 3.x (of course, backporting changes from 2.6 to 2.5 might become harder, as the chances for conflicts increase). As for statistics: there are ca. 580 uses of Py_Type in the code, 410 of Py_Size, and 20 of Py_Refcnt. How should I proceed? The next natural step would be to change 2.6, and then wait until those changes get merged into 3k. Regards, Martin From tanzer at swing.co.at Sun May 13 16:31:03 2007 From: tanzer at swing.co.at (Christian Tanzer) Date: Sun, 13 May 2007 16:31:03 +0200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: Your message of "Sat, 12 May 2007 13:03:59 PDT." Message-ID: "Guido van Rossum" wrote: > - Expect pushback on your assumption that every function or method > should be fair game for overloading. Requiring explicit tagging the > base or default implementation makes things a lot more palatable and > predictable for those of us who are still struggling to accept GFs. Front-up tagging might make it more palatable but IMHO it would be a serious mistake. I still shudder when I think of C++'s `virtual` (although it's been a looong time since I stopped using C++ [thanks, Guido!] :-) PS: I didn't have time to really study Phillip's PEP but I've followed the GF saga over the months and I'd be very happy if GF's were promoted to be a core feature of Python! -- Christian Tanzer http://www.c-tanzer.at/ From rrr at ronadam.com Sun May 13 17:04:35 2007 From: rrr at ronadam.com (Ron Adam) Date: Sun, 13 May 2007 10:04:35 -0500 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> Message-ID: <46472903.3070608@ronadam.com> Jason Orendorff wrote: > I think the gesture alone is worth it, even if no one ever used the > feature productively. But people will. The cost to python-dev is low, > and the cost to English-speaking users is very likely zero. > > What am I missing? I don't think you're missing anything. I think you are correct, the perceived impact is greater than the actual. The reason we don't run into python written in other languages more often is because most people have a language preference set when they do internet searches. Not that all programs are written using only English. As more people use python it becomes less (not more) of a need to read and understand *everyone* else's programs as there is most likely already what you need, or something close to it, in your own language. I believe the walls are not solid or one way. Good programs written in other languages most likely get translated to English at some point if they are freely distributed. Ron From collinw at gmail.com Sun May 13 17:22:03 2007 From: collinw at gmail.com (Collin Winter) Date: Sun, 13 May 2007 08:22:03 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> Message-ID: <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> On 5/12/07, Guillaume Proux wrote: [snip] > In this respect, I strongly believe that support non-ASCII identifiers > as proposed by PEP3131 would improve a number of things: > - discussion and uptake of python in "non-ascii" countries > - ability for children to learn programming in their own language (I > started programming at 7 years old and would have been very disturbed > if I could not use my own language to type in programs) > - increase of the number of new "interesting" packages from non-ascii countries > - ability for local programmers and local companies to provide > "bridges" between international (english) APIs and local APIs. > - Increase the number of python users (from 7 to 77 years old) Says you. So far, all I've seen from PEP 3131's supporters is a lot of hollow assertions and idle theorizing: "Python will be easier to use for people using non-ASCII character sets", "Python will be easier to learn for those raised with non-Roman-influenced languages", etc, etc. Until I see some kind of evidence, something to back up these claims, I'm going to assume you're wrong. Have there been studies on this kind of thing? Has there been any research into whether a mixture of English keywords and, say, Japanese and English identifiers makes a given programming language easier to learn and use? If so, why aren't they referenced in the PEP or linked in any emails? Given the lack of evidence presented so far, my operating assumption is that the PEP's supporters -- including you -- are making things up to support a conclusion that they might wish to be true. > In my humble opinion, now that UTF8 is accepted as the standard source > code encoding, it is very difficult to understand why we should start > putting restrictions on the kind of identifiers that are used (which > would force people to comment line by line as they do now!). > > When I am programming in Python, I am VERY DISTURBED when the code I > write contains much comment. It needs to be readable just by glancing > at it. > > However, for most of the people who are core python developers, you > should ask what is the typical reading speed for "ascii" characters > for a e.g. standard Japanese pupil. You would be very surprised how > slow that is. In my opinion (after leaving in Japan for quite a bit), > people are very slow to read ASCII characters and this definitely > restrain their programming productivity and expressiveness. See, that's the thing I have yet to see addressed: there's been lot of stress on "being able to write variable/class/method names in Arabic/Mandarin/Hindi will make it easier for native speakers to understand", but as far as I know, no-one has yet addressed how these non-English identifiers will mesh with the existing English keywords and English standard library functions. You say that being able to write identifiers in Cyrillic will make Python easier for Russian natives to read, to make Python code as you say, "readable just by glancing at it". But the fact is any native-language identifiers will be surrounded in a sea of English: keywords, the standard library, almost all open-source packages, etc. How does that impact your readability guesses? Also, method/function names are traditionally expressed in English as verb phrases (e.g., "isElementVisible()") which dovetail nicely with Anglo-centric keywords like "if" and "for ... in ...". How do identifiers in languages with dramatically different grammars like Japanese -- or worse, different reading orders like Farsi and Hebrew -- interact with "if", "while" and the new "x if y else z" expression, which are deeply rooted in English grammar? My suspicion is, at least for right-to-left languages like Arabic, not well, if at all. Lastly, I take issue with one of the PEP's guidelines under the "Policy Specification" section: "All identifiers in the Python standard library...SHOULD use English words wherever feasible" (emphasis in the original). Are we now going to admit the possibility that part of the standard library will be written in English, some parts will be written in Spanish and this one module over there will be written in Czech? Absolutely ludicrous. Come-on-tell-us-how-you-really-feel-ly, Collin Winter From tomerfiliba at gmail.com Sun May 13 17:33:08 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Sun, 13 May 2007 17:33:08 +0200 Subject: [Python-3000] Support for PEP 3131 Message-ID: <1d85506f0705130833v1058b022re0597cf9f259320d@mail.gmail.com> [Guillaume Proux] > In this respect, I strongly believe that support non-ASCII identifiers > as proposed by PEP3131 would improve a number of things: > - discussion and uptake of python in "non-ascii" countries > - ability for children to learn programming in their own language (I > started programming at 7 years old and would have been very disturbed > if I could not use my own language to type in programs) > - increase of the number of new "interesting" packages from non-ascii countries > - ability for local programmers and local companies to provide > "bridges" between international (english) APIs and local APIs. > - Increase the number of python users (from 7 to 77 years old) well, i myself am a native hebrew speaker, so i'm quite sensitive to text-direction issues with all sorts of editors. to this day, i haven't seen a single editor that handles RTL/LTR transitions correctly, including microsoft word. when you start mixing LTR and RTL texts, it's asking for trouble: ??_????? = "doe" ??? = 5 i don't know how that would render on your machine, but on mine it says: shem_mishpacha = "doe" 5 = gil # looks reversed, but it's actually correct (!!) so that basically rules out using hebrew, arabic and farsi from being used as identifier, and the list is not complete. now, since not all languages can be used, why bother supporting only some? and if my library exposes a function with a chinese name, how would you be able to invoke it without a chinese keyboard? you'd do better with a translator sitting between the interpreter and the editor. -tomer From foom at fuhm.net Sun May 13 17:50:31 2007 From: foom at fuhm.net (James Y Knight) Date: Sun, 13 May 2007 11:50:31 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> Message-ID: <575BCDBE-0368-4747-872F-115B2ED46122@fuhm.net> On May 13, 2007, at 11:22 AM, Collin Winter wrote: > See, that's the thing I have yet to see addressed: there's been lot of > stress on "being able to write variable/class/method names in > Arabic/Mandarin/Hindi will make it easier for native speakers to > understand", but as far as I know, no-one has yet addressed how these > non-English identifiers will mesh with the existing English keywords > and English standard library functions. You say that being able to > write identifiers in Cyrillic will make Python easier for Russian > natives to read, to make Python code as you say, "readable just by > glancing at it". But the fact is any native-language identifiers will > be surrounded in a sea of English: keywords, the standard library, > almost all open-source packages, etc. How does that impact your > readability guesses? In order to teach programming to non-english-speaking users, I would imagine people would translate some libraries (like, say, turtle), and simply teach the meaning of the few keywords. Clearly in order to use the wider range of libraries out there and communicate with the wider python community, the person will have to know english. But it should be possible to start learning the fundamentals of programming without that. James From aahz at pythoncraft.com Sun May 13 18:02:07 2007 From: aahz at pythoncraft.com (Aahz) Date: Sun, 13 May 2007 09:02:07 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4646FCAE.7090804@v.loewis.de> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> Message-ID: <20070513160206.GA3161@panix.com> On Sun, May 13, 2007, "Martin v. L?wis" wrote: > > There are other, more serious cases of presentation ambiguity > (e.g. tabs vs. spaces), yet nobody suggests to ban tabs from the > language for that reason. Well, I do. ;-) -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet From pje at telecommunity.com Sun May 13 18:19:36 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 13 May 2007 12:19:36 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: Message-ID: <20070513161749.C56DA3A409B@sparrow.telecommunity.com> At 04:31 PM 5/13/2007 +0200, Christian Tanzer wrote: >"Guido van Rossum" wrote: > > > - Expect pushback on your assumption that every function or method > > should be fair game for overloading. Requiring explicit tagging the > > base or default implementation makes things a lot more palatable and > > predictable for those of us who are still struggling to accept GFs. > >Front-up tagging might make it more palatable but IMHO it would be a >serious mistake. > >I still shudder when I think of C++'s `virtual` (although it's been a >looong time since I stopped using C++ [thanks, Guido!] :-) It's not *that* serious. Even if we end up with the stdlib-supplied version having to pre-declare functions, it'll be trivial to implement a third party library that retroactively makes it unnecessary. Specifically, the way it would work is that the overloading.rules_for() function will just need a "before" overload for FunctionType that modifies the function in-place to be suitable. So, people who want to be able to do true AOP will just need to either write a short piece of code themselves, or import it from somewhere. It'd probably look something like this: from overloading import before, rules_for, isgeneric, overloadable @before def rules_for(ob: type(lambda:None)): if not isgeneric(ob): gf = overloadable(ob) # apply the decorator ob.__code__ = gf.__code__ ob.__closure__ = gf.__closure__ ob.__globals__ = gf.__globals__ ob.__dict__ = gf.__dict__ The idea here is that if "@overloadable" is the decorator for turning a regular function into a generic function, you can simply apply it to the function and copy the new function's attributes to the old one, thereby converting it in-place. It might be slightly trickier than shown, but probably not much. From gproux+py3000 at gmail.com Sun May 13 18:18:09 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Mon, 14 May 2007 01:18:09 +0900 Subject: [Python-3000] Fwd: Support for PEP 3131 In-Reply-To: <19dd68ba0705130917y3988052cnbbebf1536cc25bdd@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> <19dd68ba0705130917y3988052cnbbebf1536cc25bdd@mail.gmail.com> Message-ID: <19dd68ba0705130918p29c8a79ds9ac59b890a3d7feb@mail.gmail.com> Hello, On 5/14/07, Collin Winter wrote: > Says you. So far, all I've seen from PEP 3131's supporters is a lot of > hollow assertions and idle theorizing: "Python will be easier to use > for people using non-ASCII character sets", "Python will be easier to > learn for those raised with non-Roman-influenced languages", etc, etc. > Until I see some kind of evidence, something to back up these claims, > I'm going to assume you're wrong. How could you gather any evidence without implementing the support first? I understand that your argument being really weak and not supported by actual fact or evidence, it should also made void. > Have there been studies on this kind of thing? Has there been any Part of my first post was to try to call some attention on this, precisely to gather more opinions and if possible evidence. > natives to read, to make Python code as you say, "readable just by > glancing at it". But the fact is any native-language identifiers will > be surrounded in a sea of English: keywords, the standard library, > almost all open-source packages, etc. How does that impact your > readability guesses? Let me understand how well you understand the problem yourself since I am trying to see why you are putting up such a resistance to a somehow innocuous change for "you" as a person for whom reading/writing latin character is a core competency. After 10 years in Japan (and near fluency), I can personally witness of the inappropriateness of using latin characters to express yourself when thinking in Japanese. > [...] > which are deeply rooted in English grammar? My suspicion is, at least > for right-to-left languages like Arabic, not well, if at all. While I still do not understand the reason for such an opposition from people for whom this PEP would likely have no impact, but I expected people using RTL languages to come up with a "this won't work with RTL language" arguments. However, PEP3131 is none about grammar, and all about the ability to freely choose your identifiers (vars and method names) from the charset that pleases you for your own programs. Implementation of PEP3131 will not hinder people with RTL languages. The fact that Python will continue having a standard latin inspired grammar it not going to make RTL language people less productive than before. Once again, this is a zero impact on them and this will still greatly benefit people who understand the orthogonality between grammar and variable naming. Regards, Guillaume From gproux+py3000 at gmail.com Sun May 13 18:25:37 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Mon, 14 May 2007 01:25:37 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <1d85506f0705130833v1058b022re0597cf9f259320d@mail.gmail.com> References: <1d85506f0705130833v1058b022re0597cf9f259320d@mail.gmail.com> Message-ID: <19dd68ba0705130925j1dd55f1boba9e1b6c036d0422@mail.gmail.com> Dear Tomer, > well, i myself am a native hebrew speaker, so i'm quite sensitive > to text-direction issues with all sorts of editors. to this day, i haven't > seen a single editor that handles RTL/LTR transitions correctly, > including microsoft word. Are you talking about editor bugs? You should find a way to report the bugs to the people in charge of development of those editors, but I believe nothing here is python fault. > when you start mixing LTR and RTL texts, it's asking for trouble: > ??_????? = "doe" > ??? = 5 Looks cool :) > shem_mishpacha = "doe" > 5 = gil # looks reversed, but it's actually correct (!!) > so that basically rules out using hebrew, arabic and farsi from being > used as identifier, and the list is not complete. You are ruling out PEP3131 because there is no good editor able to support your language? True, for the same reason we should never have made a Unicode standard. If the editor is the problem, fix the editor. > now, since not all languages can be used, why bother supporting only some? > and if my library exposes a function with a chinese name, how would you > be able to invoke it without a chinese keyboard? If you don't speak chinese, i reckon the probability you will ever (find and) use a library that would expose chinese name is 0. Regards, Guillaume From collinw at gmail.com Sun May 13 19:09:13 2007 From: collinw at gmail.com (Collin Winter) Date: Sun, 13 May 2007 10:09:13 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705130925j1dd55f1boba9e1b6c036d0422@mail.gmail.com> References: <1d85506f0705130833v1058b022re0597cf9f259320d@mail.gmail.com> <19dd68ba0705130925j1dd55f1boba9e1b6c036d0422@mail.gmail.com> Message-ID: <43aa6ff70705131009s7d5b177dmea7c790d670ac3c0@mail.gmail.com> On 5/13/07, Guillaume Proux wrote: > [Tomer Filiba] > > when you start mixing LTR and RTL texts, it's asking for trouble: > > ??_????? = "doe" > > ??? = 5 > > > > shem_mishpacha = "doe" > > 5 = gil # looks reversed, but it's actually correct (!!) > > > > so that basically rules out using hebrew, arabic and farsi from being > > used as identifier, and the list is not complete. > > You are ruling out PEP3131 because there is no good editor able to > support your language? True, for the same reason we should never have > made a Unicode standard. > If the editor is the problem, fix the editor. No no no no no. This isn't a problem with the editor: it's a problem with allowing Hebrew identifiers. Tomer can correct me on this, but I strongly doubt that it improves readability by forcing the programmer to constantly change which direction they're reading from, e.g., "if ??_?????.strip():" Collin Winter From jcarlson at uci.edu Sun May 13 19:17:06 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 13 May 2007 10:17:06 -0700 Subject: [Python-3000] mixin class decorator In-Reply-To: <1d85506f0705100436j4ed5c2f7xe6bef98c3b86f5bf@mail.gmail.com> References: <1d85506f0705100436j4ed5c2f7xe6bef98c3b86f5bf@mail.gmail.com> Message-ID: <20070513100929.854C.JCARLSON@uci.edu> "tomer filiba" wrote: > > with the new class decorators of py3k, new use cases emerge. > for example, now it is easy to have real mixin classes or even > mixin modules, a la ruby. [snip] > does it seem useful? should it be included in some stdlib? > or at least mentioned as a use case for class decorators in PEP 3129? > (not intended for 3.0a1) There are many use-cases for class decorators. Since the PEP has already been accepted, including this is unnecessary (for acceptance or rejection purposes). About the only thing that I think would be nice is if we could get class decorators in 2.6 as well (a future import would work for me). (which would also open up the gates for 3rd party or stdlib-based type annotation library like @implements(interface.foo) ). - Josiah From fdrake at acm.org Sun May 13 19:21:59 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sun, 13 May 2007 13:21:59 -0400 Subject: [Python-3000] mixin class decorator In-Reply-To: <20070513100929.854C.JCARLSON@uci.edu> References: <1d85506f0705100436j4ed5c2f7xe6bef98c3b86f5bf@mail.gmail.com> <20070513100929.854C.JCARLSON@uci.edu> Message-ID: <200705131322.00152.fdrake@acm.org> On Sunday 13 May 2007, Josiah Carlson wrote: > About the only thing that I think would be nice is if we could get class > decorators in 2.6 as well (a future import would work for me). Since class decorators don't introduce a new keyword, there'd be no need for a future import. Something that's no syntactically legal now would become legal, and that's an allowed change. -Fred -- Fred L. Drake, Jr. From tomerfiliba at gmail.com Sun May 13 19:42:45 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Sun, 13 May 2007 19:42:45 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <43aa6ff70705131009s7d5b177dmea7c790d670ac3c0@mail.gmail.com> References: <1d85506f0705130833v1058b022re0597cf9f259320d@mail.gmail.com> <19dd68ba0705130925j1dd55f1boba9e1b6c036d0422@mail.gmail.com> <43aa6ff70705131009s7d5b177dmea7c790d670ac3c0@mail.gmail.com> Message-ID: <1d85506f0705131042q23270a91qa31ff2f3940019ed@mail.gmail.com> On 5/13/07, Collin Winter wrote: > No no no no no. This isn't a problem with the editor: it's a problem > with allowing Hebrew identifiers. Tomer can correct me on this, but I > strongly doubt that it improves readability by forcing the programmer > to constantly change which direction they're reading from, e.g., "if > ??_?????.strip():" well, you're right, but i'd've chosen another counter-example: if ??????.?????: pass which comes first? does it say bacon.eggs or eggs.bacon? and what happens if the editor uses a dot prefixed by LTR marker? the meaning is reversed, but it still looks the same! [Guillaume Proux] > You are ruling out PEP3131 because there is no good editor able to > support your language? first, technical limitations do control the way we use computers. it's a fact. second, RTL/LTR issues are nondeterministic, and i'd rather leave heuristics out of my code. again, if you want a hebrew-version of python, you'd also want hebrew semantics and hebrew syntax. see also http://cheeseshop.python.org/pypi/hpy --- you can always translate or transliterate a word to english, like so: if beykon.beytzim: in the worst case, it wouldn't be meaningful to the english reader, but at least other ppl could use your code. > If you don't speak chinese, i reckon the probability you will ever > (find and) use a library that would expose chinese name is 0. you'd be surprised how many times i scanned through japanese/chinese forums with an online translator, looking for documentation or cheaper products :) -tomer From gproux+py3000 at gmail.com Sun May 13 20:04:21 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Mon, 14 May 2007 03:04:21 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <1d85506f0705131042q23270a91qa31ff2f3940019ed@mail.gmail.com> References: <1d85506f0705130833v1058b022re0597cf9f259320d@mail.gmail.com> <19dd68ba0705130925j1dd55f1boba9e1b6c036d0422@mail.gmail.com> <43aa6ff70705131009s7d5b177dmea7c790d670ac3c0@mail.gmail.com> <1d85506f0705131042q23270a91qa31ff2f3940019ed@mail.gmail.com> Message-ID: <19dd68ba0705131104r85531f3o12b7e1769d7b7140@mail.gmail.com> HI Tomer, > if ??????.?????: > pass > > which comes first? does it say bacon.eggs or eggs.bacon? > and what happens if the editor uses a dot prefixed by LTR > marker? the meaning is reversed, but it still looks the same! All that is really a *presentation* issue. And as such, an editor specialized in editing hebrew or arabic python should help you write the code you want to write. However, why would you put a LTR marker here? why try to add issues? Also as soon as UTF-8 is accepted as the *standard* encoding, isn't this issue the same with latin characters (not sure here, just asking). Additionally,would a professional programmer choose to add LTR markers to make the source code ambiguous? > again, if you want a hebrew-version of python, you'd also want > hebrew semantics and hebrew syntax. > see also http://cheeseshop.python.org/pypi/hpy Yes, but let PEP3131 go forward first, as this will make it easier for you to implement the full hebrew semantics. > you can always translate or transliterate a word to english, like so: > if beykon.beytzim: Is this a bijective translation ? How good is most people latin character reading ability among Hebrew speakers? From the beginning, I can tell from experience that Japanese people have great difficulties in reading english or even transliterated japanese (which is never good anyway because of homonyms) > you'd be surprised how many times i scanned through > japanese/chinese forums with an online translator, looking > for documentation or cheaper products :) I am happy to see your open minded spirit. Keep this open minded spirit when evaluating the PEP3131 :) Regards, Guillaume From martin at v.loewis.de Sun May 13 20:27:34 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 13 May 2007 20:27:34 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> Message-ID: <46475896.80402@v.loewis.de> > Have there been studies on this kind of thing? Has there been any > research into whether a mixture of English keywords and, say, Japanese > and English identifiers makes a given programming language easier to > learn and use? If so, why aren't they referenced in the PEP or linked > in any emails? There is anecdotal evidence that people intuitively use characters from their native language, and then are surprised by the syntax errors. Unfortunately, they are not required to report their usage to the Ministry for Use Of Funny Characters In Programming Languages. > Given the lack of evidence presented so far, my > operating assumption is that the PEP's supporters -- including you -- > are making things up to support a conclusion that they might wish to > be true. Are you also assuming that I make up my mentioning of anecdotal evidence? > Also, method/function names are traditionally expressed in English as > verb phrases (e.g., "isElementVisible()") which dovetail nicely with > Anglo-centric keywords like "if" and "for ... in ...". How do > identifiers in languages with dramatically different grammars like > Japanese -- or worse, different reading orders like Farsi and Hebrew > -- interact with "if", "while" and the new "x if y else z" expression, > which are deeply rooted in English grammar? My suspicion is, at least > for right-to-left languages like Arabic, not well, if at all. I don't speak Farsi, Arabic, or Hebrew, so I can't comment on that. I know that in German, if/while is not an issue at all. People regularly read "if" aloud as "wenn" or "falls". > Lastly, I take issue with one of the PEP's guidelines under the > "Policy Specification" section: "All identifiers in the Python > standard library...SHOULD use English words wherever feasible" > (emphasis in the original). Are we now going to admit the possibility > that part of the standard library will be written in English, some > parts will be written in Spanish and this one module over there will > be written in Czech? Absolutely ludicrous. The emphasis follows the convention of RFC 2119; it's not an emphasis, but an indication of specification. 3. SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. There are already deviations from that rule in the standard library: (aifc.AIFC_read.)initfp (aifc.AIFC_read.)getnchannels (aifc.AIFC_read.)getcomptype (asyncore.dispatcher.)del_channel (binhex.)_Hqxcoderengine opcode.def_op opcode.name_op opcode.jrel_op (etc.) all are not proper English words. Regards, Martin From hanser at club-internet.fr Sun May 13 21:01:35 2007 From: hanser at club-internet.fr (Pierre Hanser) Date: Sun, 13 May 2007 21:01:35 +0200 Subject: [Python-3000] Support for PEP 3131 Message-ID: <4647608F.7000202@club-internet.fr> some personnal thoughts on the subject, pro, of course: 1) it's a matter of justice: everybody should have the right to name his variables with the exact name he prefers. 2) it's a matter of equity: why would only english speakers able to write programs? (and even english uses accents in numerous words: caf?, ...) 3) it's a matter of freedom, finally dropping old imperialism I would say this PEP has something to do with freeing the world from imperialism! you will have to be very convincing to refuse the pep without spreading a disgracious feeling of wanting to spread english more than needed (i know that many people on this list are not english nor US citizens, but that's not enough to avoid this feeling...) 4) it does not suppress anything to anybody: the risk is to have more programs available, even if you can't fully read them. No need to come with a proof of utility: you dislike => don't use it, but don't prevent others to use it. 5) rules for official library may be stricter Everybody is mature enough to know the audience of his programs: personnal => may use native language, public => should use english, it seems. in fact I'm even not against inclusion of not english libraries in the standard lib, but if this is the way to have the pep, i would accept restrictions. may be a lexicon at the beginning of each library written in a foreign language would be enough 6) it gives access to litterary programming I try to avoid the expression 'litterate programming' because I know the traditionnal connotation, but my english is not good enough to find a good title. So, let me describe: one of the biggest advantages I see is the ability to write well written programming expressions. French without accents is not really french and looks always poor to demanding people. joy of programming comes with the beauty of well written algorithms, written in good language. And your language is always a bit better than any other, if you can write it in it's full glory. 7) most other considerations do not matter font: probably you can find fonts for most languages (?) difficulty of writing: this get's better from day to day ... 8) the programmer, He in the previous lines, may be of course male or female... my 8 cents for this evening -- Pierre From murman at gmail.com Sun May 13 21:04:48 2007 From: murman at gmail.com (Michael Urman) Date: Sun, 13 May 2007 14:04:48 -0500 Subject: [Python-3000] Unicode strings, identifiers, and import Message-ID: This occurred to me while reading the PEP 3131 discussion, and while it's not limited to PEP 3131 concerns, I don't believe I've seen discussed yet elsewhere. What is the interaction between import or __import__ and Unicode module names (or at least Unicode strings describing them). Currently in python 2.5, __import__ appears coerce to str, leading to the following error case: >>> __import__(unicodedata.lookup('GREEK SMALL LETTER EPSILON')) Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b5' in position 0: ordinal not in range(128) With str being the Unicode type in py3k, this branch of the potential problem needs to be addressed clearly, whether by defining __import__ as converting through ASCII, or by defining a useful semantic. If PEP 3131 is to be accepted, then it should probably address whether import will work on non-ASCII identifiers, and if so what the semantics are (if __import__ would otherwise limit to ASCII). I'm a little worried on the implementation side, because while on Windows it should be easy to use unicode file APIs, on Linux the filenames may or may be UTF-8 friendly. Michael -- Michael Urman From guido at python.org Sun May 13 21:15:47 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 13 May 2007 12:15:47 -0700 Subject: [Python-3000] PEP 3123 (Was: PEP Parade) In-Reply-To: <464720E8.3040402@v.loewis.de> References: <464720E8.3040402@v.loewis.de> Message-ID: I'm okay with applying to 2.6 and then merging into 3.0. ISTM though that backporting this to 2.5 would cause the release manager to throw a fit, so I think that's not worth it. What would be the benefit anyway? --Guido On 5/13/07, "Martin v. L?wis" wrote: > > S 3123 Making PyObject_HEAD conform to standard C von L?wis > > > > I like it, but who's going to make the changes? Once those chnges have > > been made, will it still be reasonable to expect to merge C code from > > the (2.6) trunk into the 3.0 branch? > > I just created bugs.python.org/1718153, which implements this PEP. > > I had to add a number of additional macros (Py_Refcnt, Py_Size, > PyVarObject_HEAD_INIT); using these macros throughout is the bulk > of the change. > > If the macros are backported to 2.x (omitting the "hard" changes > to PyObject itself), then the code base can stay the same between > 2.x and 3.x (of course, backporting changes from 2.6 to 2.5 might > become harder, as the chances for conflicts increase). > > As for statistics: there are ca. 580 uses of Py_Type in the code, > 410 of Py_Size, and 20 of Py_Refcnt. > > How should I proceed? The next natural step would be to change > 2.6, and then wait until those changes get merged into 3k. > > Regards, > Martin > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun May 13 21:29:23 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 13 May 2007 12:29:23 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> Message-ID: On 5/13/07, Collin Winter wrote: > Have there been studies on this kind of thing? Has there been any > research into whether a mixture of English keywords and, say, Japanese > and English identifiers makes a given programming language easier to > learn and use? If so, why aren't they referenced in the PEP or linked > in any emails? Given the lack of evidence presented so far, my > operating assumption is that the PEP's supporters -- including you -- > are making things up to support a conclusion that they might wish to > be true. In particular, AFAIK Java has allowed all Unicode letters in identifiers right from the start. I'd like to hear about descriptions of actual user experiences with this feature, in Java or in any other language that supports it. (*Are* there any others?) That would be far more valuable to me than any continued argumentation for or against the proposal. I also note that there's no particular reason why this needs to be done exactly in 3.0. It's not backwards incompatible -- it could be done in 2.6 if people really really want it, or it could be introduced in 3.1, 3.2 or whenever the world appears to be ready. I certainly don't consider it an early design mistake to only require ASCII -- at the time it was the only sane thing to do and I'm far from convinced that it needs to change now. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun May 13 21:31:00 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 13 May 2007 12:31:00 -0700 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: References: Message-ID: The answer to all of this is the filesystem encoding, which is already supported. Doesn't appear particularly difficult to me. On 5/13/07, Michael Urman wrote: > This occurred to me while reading the PEP 3131 discussion, and while > it's not limited to PEP 3131 concerns, I don't believe I've seen > discussed yet elsewhere. What is the interaction between import or > __import__ and Unicode module names (or at least Unicode strings > describing them). Currently in python 2.5, __import__ appears coerce > to str, leading to the following error case: > > >>> __import__(unicodedata.lookup('GREEK SMALL LETTER EPSILON')) > Traceback (most recent call last): > File "", line 1, in > UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b5' in > position 0: ordinal not in range(128) > > With str being the Unicode type in py3k, this branch of the potential > problem needs to be addressed clearly, whether by defining __import__ > as converting through ASCII, or by defining a useful semantic. If PEP > 3131 is to be accepted, then it should probably address whether import > will work on non-ASCII identifiers, and if so what the semantics are > (if __import__ would otherwise limit to ASCII). > > I'm a little worried on the implementation side, because while on > Windows it should be easy to use unicode file APIs, on Linux the > filenames may or may be UTF-8 friendly. > > Michael > -- > Michael Urman > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From 2007 at jmunch.dk Sun May 13 21:10:09 2007 From: 2007 at jmunch.dk (Anders J. Munch) Date: Sun, 13 May 2007 21:10:09 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> Message-ID: <46476291.6040502@jmunch.dk> Collin Winter wrote: >So far, all I've seen from PEP 3131's supporters is a lot of > hollow assertions and idle theorizing: "Python will be easier to use > for people using non-ASCII character sets", "Python will be easier to > learn for those raised with non-Roman-influenced languages", etc, etc. > Until I see some kind of evidence, something to back up these claims, > I'm going to assume you're wrong. > You haven't brought any hard evidence to the table yourself, so in the absense of that, my anecdotal evidence trumps your pure speculation ;-) I've coded non-trivial stuff in three languages: Danglish, English and Danish. Well, strictly speaking only the latter two are real langauages; Danglish is just a name for way Danish programmers typically write: A hodge-podge of Danish and English mixed with no apparent system, ever preferring whichever word springs to mind first, switching to (bad) English whenever the Danish alternative would need transliteration. Or worse, switching to a different but less appropriate Danish word that has the sole advantage of not needing transliteration. I've found that using my native Danish is the better option of the three because, unsurprisingly, I am are more productive using my native language than a foreign language. Do I really need to submit proof for that? Isn't that just obvious? > See, that's the thing I have yet to see addressed: there's been lot of > stress on "being able to write variable/class/method names in > Arabic/Mandarin/Hindi will make it easier for native speakers to > understand", but as far as I know, no-one has yet addressed how these > non-English identifiers will mesh with the existing English keywords > and English standard library functions. They mesh *brilliantly*. The different languages used means that the provenance of identifiers is intuitively available: English identifiers means std. lib. or 3rd party, native language means in-house. Very helpful - my heart goes out to the poor suffering monolinguists who must do without this valuable code reading aid. +1 on PEP 3131. greetings-from-rainy-Denmark-ly y'rs, Anders From santagada at gmail.com Sun May 13 22:16:35 2007 From: santagada at gmail.com (Leonardo Santagada) Date: Sun, 13 May 2007 17:16:35 -0300 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> Message-ID: <46477223.2040105@gmail.com> Guido van Rossum escreveu: > In particular, AFAIK Java has allowed all Unicode letters in > identifiers right from the start. I'd like to hear about descriptions > of actual user experiences with this feature, in Java or in any other > language that supports it. (*Are* there any others?) That would be far > more valuable to me than any continued argumentation for or against > the proposal. > Javascript also supports identifiers using any unicode letter, but you cannot use escape sequences as in java (I don't really know if java support that, but the ecmascript spec gave me this impression). The thing is, living in Brazil (latin-1 characters) I never ever seen any javascript or java code using unicode identifiers. I've seen them using unicode string literals and that is supported by python. I Still think that what is needed for children and people starting with programming is something more than identifiers, is a complete system in their language (Someone said to me in passing that C++ standard committee had a proposal on this kind of sutff). To better the support for Brazilian portuguese on python we need the whole standard library and all strings be unicode... and that will be solved by python 3k. There were comments on the brazilian community that unicode errors was the most common kind of errors by far in any software that needs string processing. From tjreedy at udel.edu Sun May 13 22:16:33 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 13 May 2007 16:16:33 -0400 Subject: [Python-3000] Support for PEP 3131 References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com><4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> Message-ID: ""Martin v. L?wis"" wrote in message news:4646FCAE.7090804 at v.loewis.de... | That's a red herring. That is how I felt when you dismissed my effort to make your proposal more useful and more acceptable to some (by addressing transliteration) with the little molehill problem that Norwegians and Germans disagree about o: (rotated 90 degrees). So I shut up. However, to me, one impetus to expanding the Python char set is the OLPC project. For children to share programs across national boundaries, programs written in local chars will usually have to be transliterated to the one set of chars common to all versions of the laptop. Terry Jan Reedy From baptiste13 at altern.org Sun May 13 23:12:31 2007 From: baptiste13 at altern.org (Baptiste Carvello) Date: Sun, 13 May 2007 23:12:31 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> Message-ID: Jason Orendorff a ?crit : > > Python should allow foreign-language identifiers because (1) it's a > gesture of good will to people everywhere who don't speak English > fluently; (2) some students will benefit; (3) some people writing code > that no one else will ever see will benefit. > As I said in a previous post, these use cases would be well served by a command line switch. People who do not care about distributing their code can just do alias python = python -I On the other hand, people who want wider distribution would test without the switch and easily check that all their identifiers are ASCII. The default should be the best choice for the python open source community, that is ASCII identifiers only. Cheers, Baptiste From brett at python.org Mon May 14 00:58:52 2007 From: brett at python.org (Brett Cannon) Date: Sun, 13 May 2007 15:58:52 -0700 Subject: [Python-3000] getting compiler package failures Message-ID: I just did a ``make distclean`` on a clean checkout (r55300) and test_compiler/test_transformer are failing: File "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", line 715, in atom return self._atom_dispatch[nodelist[0][0]](nodelist) KeyError: 322 or File "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", line 776, in lookup_node return self._dispatch[node[0]] KeyError: 331 or File "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", line 783, in com_node return self._dispatch[node[0]](node[1:]) KeyError: 339 I don't know the compiler package at all (which is why I am currently stuck on Tony Lownds' PEP 3113 patch since I am getting a compiler.transformer.WalkerError) so I have no clue how to go about fixing this. Anyone happen to know what may have caused the breakage? -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070513/b1e54f47/attachment.htm From gproux+py3000 at gmail.com Mon May 14 01:17:25 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Mon, 14 May 2007 08:17:25 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> Message-ID: <19dd68ba0705131617w4bbc45f3p833159a20552e909@mail.gmail.com> Dear all, On 5/14/07, Guido van Rossum wrote: > of actual user experiences with this feature, in Java or in any other > language that supports it. (*Are* there any others?) That would be far > more valuable to me than any continued argumentation for or against > the proposal. Interestingly, this is *not* a well known fact. I have asked 2 friend-of-mine seasoned Java programmers and they were *amazed* that this is supported. However, if you consider XML as a language, then there was plenty of discussion in the past talking about the benefits of allowing unicode characters in tags. see e.g. http://lists.xml.org/archives/xml-dev/200107/msg00254.html > I also note that there's no particular reason why this needs to be > done exactly in 3.0. It's not backwards incompatible -- it could be As one realizes that this needs to be done, then I would love to see that introduced in 2.5 :) > don't consider it an early design mistake to only require ASCII -- at > the time it was the only sane thing to do and I'm far from convinced > that it needs to change now. I wish you would be able to let us know precisely what hinders you in accepting this change. After reading the following document, http://www.python.org/doc/essays/foreword/ , I expected you would be very open-minded to that change especially when you are talking about "the emphasis on readibility" and how you came up with Python as an evolution of ABC "a wonderful teaching language" because I can't fail to see how such change would be incredibly important for Python as a "wonderful teaching language" especially towards children in countries where latin characters are really foreign. Regards, Guillaume From guido at python.org Mon May 14 02:15:24 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 13 May 2007 17:15:24 -0700 Subject: [Python-3000] getting compiler package failures In-Reply-To: References: Message-ID: test_compiler and test_transformer have been broken for a couple of months now I believe. Unless someone comes to the rescue of the compiler package soon, I'm tempted to remove it from the p3yk branch -- it doesn't seem to serve any particularly good purpose, especially now that the AST used by the compiler written in C is exportable. --Guido On 5/13/07, Brett Cannon wrote: > I just did a ``make distclean`` on a clean checkout (r55300) and > test_compiler/test_transformer are failing: > > File > "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", > line 715, in atom > return self._atom_dispatch[nodelist[0][0]](nodelist) > KeyError: 322 > > or > > File > "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", > line 776, in lookup_node > return self._dispatch[node[0]] > KeyError: 331 > > or > > File > "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", > line 783, in com_node > return self._dispatch[node[0]](node[1:]) > KeyError: 339 > > > I don't know the compiler package at all (which is why I am currently stuck > on Tony Lownds' PEP 3113 patch since I am getting a > compiler.transformer.WalkerError) so I have no clue how to > go about fixing this. Anyone happen to know what may have caused the > breakage? > > -Brett > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Mon May 14 02:17:24 2007 From: brett at python.org (Brett Cannon) Date: Sun, 13 May 2007 17:17:24 -0700 Subject: [Python-3000] getting compiler package failures In-Reply-To: References: Message-ID: On 5/13/07, Guido van Rossum wrote: > > test_compiler and test_transformer have been broken for a couple of > months now I believe. > > Unless someone comes to the rescue of the compiler package soon, I'm > tempted to remove it from the p3yk branch -- it doesn't seem to serve > any particularly good purpose, especially now that the AST used by the > compiler written in C is exportable. +1000 from me. I was thinking of suggesting this, but I forgot to put it in the email. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070513/5039b525/attachment.htm From greg.ewing at canterbury.ac.nz Mon May 14 02:32:04 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 14 May 2007 12:32:04 +1200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4646B6A1.7060007@club-internet.fr> References: <4646B6A1.7060007@club-internet.fr> Message-ID: <4647AE04.1040207@canterbury.ac.nz> Pierre Hanser wrote: > In english, most of the time, adding 'ed' to the verb will do > the difference: change -> changed > > in french: change -> chang? (ends with 'eacute') Fine if the reader understands French, but if you later want to translate this program so that a non-French speaker can read it, what would you do? -- Greg From guido at python.org Mon May 14 02:39:54 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 13 May 2007 17:39:54 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705131617w4bbc45f3p833159a20552e909@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> <19dd68ba0705131617w4bbc45f3p833159a20552e909@mail.gmail.com> Message-ID: On 5/13/07, Guillaume Proux wrote: > On 5/14/07, Guido van Rossum wrote: > > of actual user experiences with this feature, in Java or in any other > > language that supports it. (*Are* there any others?) That would be far > > more valuable to me than any continued argumentation for or against > > the proposal. > > Interestingly, this is *not* a well known fact. I have asked 2 > friend-of-mine seasoned Java programmers and they were *amazed* that > this is supported. Well, maybe we should add it to Python as a secret feature. :-) :-) :-) > However, if you consider XML as a language, then there was plenty of > discussion in the past talking about the benefits of allowing unicode > characters in tags. > see e.g. http://lists.xml.org/archives/xml-dev/200107/msg00254.html I imagine the situation there is sufficiently different though; XML is data, not code. > > I also note that there's no particular reason why this needs to be > > done exactly in 3.0. It's not backwards incompatible -- it could be > > As one realizes that this needs to be done, then I would love to see > that introduced in 2.5 :) I realize you've added a smiley, but please, don't propose new features for a release that's already been released. The release managers will put you in jail and not let you out until 4.0 has been released. :-) > > don't consider it an early design mistake to only require ASCII -- at > > the time it was the only sane thing to do and I'm far from convinced > > that it needs to change now. > > I wish you would be able to let us know precisely what hinders you in > accepting this change. Because most people still use systems that have very inadequate tools for handling non-ASCII text, especially non-Latin-1 text. For example, at work I use Ubuntu, a modern Linux distribution actively supported by a company headquartered in South-Africa. Their main market lies outside Europe and North America. And yet, there is no standard way to enter non-ASCII characters as basic as c-cedilla or u-umlaut; the main tools I use (Emacs, Firefox and bash running in a terminal emulator) all have different input methods, different ideas of the default character encoding, and so on. It's a crapshoot whether copy-and-pasting even the simplest non-ASCII text (like the name of PEP 3131's author :-) between any two of these will work. I see program code as a tool for communication between people. Note how you & I are using English in this thread even though it is not the mother tongue for either of us. So we use English, since we can both read and write it reasonably well. This is the *only* way that programmers raised in different countries can exchange code at all. (It may change if at some point in the future computer translation gets 1000x better, but we're not there yet -- try translate.google.com if you don't believe me.) Now, you may disagree with me on the conclusion even if you agree on the premises. But you asked for my motivation, and this is it. > After reading the following document, > http://www.python.org/doc/essays/foreword/ , I expected you would be > very open-minded to that change especially when you are talking about > "the emphasis on readibility" and how you came up with Python as an > evolution of ABC "a wonderful teaching language" because I can't fail > to see how such change would be incredibly important for Python as a > "wonderful teaching language" especially towards children in countries > where latin characters are really foreign. You're stretching my words there. The issue if translation hadn't crossed my mind when I wrote that (over 10 years ago) and the tools *really* weren't ready then. And regarding readability, if all the programmers in the world agreed to use broken English, the readability of their code to each other would be much better dan als we allemaal in onze eigen taal schreven. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From arvind1.singh at gmail.com Mon May 14 02:41:49 2007 From: arvind1.singh at gmail.com (Arvind Singh) Date: Mon, 14 May 2007 06:11:49 +0530 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <46476291.6040502@jmunch.dk> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> <46476291.6040502@jmunch.dk> Message-ID: On 5/14/07, Anders J. Munch <2007 at jmunch.dk> wrote: > You haven't brought any hard evidence to the table yourself, so in > the absense of that, my anecdotal evidence trumps your pure > speculation ;-) Fact: Younger brains learn new concepts (and languages) faster than older ones. Argument: To be part of "international" programming community (or a *real* programmer), one has to learn English anyway, why help anyone develop a habit which he/she will have to discard later? [Indians usually deal with 3 languages in their childhood (English, Hindi, Sanskrit/local language).] I've coded non-trivial stuff in three languages: Danglish, English and > Danish. Well, strictly speaking only the latter two are real > langauages; Danglish is just a name for way Danish programmers > typically write: A hodge-podge of Danish and English mixed with no > apparent system, ever preferring whichever word springs to mind first, > switching to (bad) English whenever the Danish alternative would need > transliteration. Or worse, switching to a different but less > appropriate Danish word that has the sole advantage of not needing > transliteration. This PEP talks about support for *identifiers*. If you need *extensive* vocabulary for your *identifiers*, I'd assume that you're coding something non-trivial (with ignorable exceptions). Such non-trivial code should be sharable under a _common_ language that *others* can understand as well, IMHO. Further, if you are doing something non-trivial, I can also assume that you'd be using third-party libraries. How would the code look if identifiers were written in various encodings? I've found that using my native Danish is the better option of the > three because, unsurprisingly, I am are more productive using my > native language than a foreign language. Do I really need to submit > proof for that? Isn't that just obvious? Not so obvious to me, actually. Ask any good user-interface designer, humans aren't (generally; since I see you as a "gifted" exception :-) ) good with "modal" interfaces. The more "modes" one has to shift among, the lesser the productivity, in general. Maybe you feel more productive because of lengthy "modes" or long pieces of code (i.e., looong functions): not a good programming practice, as I've been taught. They mesh *brilliantly*. The different languages used means that the > provenance of identifiers is intuitively available: English > identifiers means std. lib. or 3rd party, native language means > in-house. Very helpful - my heart goes out to the poor suffering > monolinguists who must do without this valuable code reading aid. Since Hindi was mentioned, I'd like to say: Don't even think about it! +1 on PEP 3131. Without knowing whether I have a say or not: -1 on this PEP Regards, Arvind -- There should be one-- and preferably only one --obvious way to choose your *identifiers*. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070514/abb971d4/attachment.html From greg.ewing at canterbury.ac.nz Mon May 14 02:46:23 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 14 May 2007 12:46:23 +1200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4646FCAE.7090804@v.loewis.de> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> Message-ID: <4647B15F.7040700@canterbury.ac.nz> Martin v. L?wis wrote: > There are other, more serious cases of presentation ambiguity > (e.g. tabs vs. spaces), yet nobody suggests to ban tabs from the > language for that reason. But we *have* suggested banning mixed tabs and spaces (rather than just recommending against it), which is something that can be automatically verified. I don't think this scenario is all that unlikely. A program is initially written by a Russian programmer who uses his own version of "a" as a variable name. Later an English-speaking programmer makes some changes, and uses an ascii "a". Now there are two subtly different variables called "a" in different parts of the program. -- Greg From gproux+py3000 at gmail.com Mon May 14 03:09:29 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Mon, 14 May 2007 10:09:29 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> <19dd68ba0705131617w4bbc45f3p833159a20552e909@mail.gmail.com> Message-ID: <19dd68ba0705131809i11cd0b82pfc226bd81dd16e99@mail.gmail.com> Hello, > > Interestingly, this is *not* a well known fact. I have asked 2 > > friend-of-mine seasoned Java programmers and they were *amazed* that > > this is supported. > Well, maybe we should add it to Python as a secret feature. :-) :-) :-) But they also said that: 1) they wish they would have known earlier... 2) would start using this immediatly for their own small projects > > see e.g. http://lists.xml.org/archives/xml-dev/200107/msg00254.html > I imagine the situation there is sufficiently different though; XML is > data, not code. I wish you had enough time to read some of the posts linked from the above URL. In particular, you can see the viewpoint of some Japanese people on the ability for them to describe data structures (which is really a programming concept) in their own words. > I realize you've added a smiley, but please, don't propose new > features for a release that's already been released. The release > managers will put you in jail and not let you out until 4.0 has been > released. :-) eheheheh :) > Because most people still use systems that have very inadequate tools > for handling non-ASCII text, especially non-Latin-1 text. For example, > at work I use Ubuntu, a modern Linux distribution actively supported > by a company headquartered in South-Africa. Their main market lies > outside Europe and North America. And yet, there is no standard way to > enter non-ASCII characters as basic as c-cedilla or u-umlaut; the main I also use Ubuntu at home. Regarding your issue: hum? you can change keyboard layout (I even think it does affect the current input system immediatly). Also there is a number of tools like gucharmap (http://gucharmap.sourceforge.net/shots/shot-003.png) that enables you to copy paste rare characters. > tools I use (Emacs, Firefox and bash running in a terminal emulator) > all have different input methods, different ideas of the default > character encoding, and so on. It's a crapshoot whether > copy-and-pasting even the simplest non-ASCII text (like the name of > PEP 3131's author :-) between any two of these will work. Ubuntu Feisty (and I think Edgy too) default on UTF8 everywhere and I have never had any issue using French, Japanese and English anywhere. Windows came to this maturity point about 5-6 years ago. > I see program code as a tool for communication between people. Note > how you & I are using English in this thread even though it is not the > mother tongue for either of us. So we use English, since we can both > read and write it reasonably well. This is the *only* way that > programmers raised in different countries can exchange code at all. I *totally* agree with you, you sometimes need to go down to the lowest common denominator (with tongue in cheek)... But I still do not understand that you are not happy to see people become more productive with Python when there is no need of international exchange: the small (or large) internal application, the throw-away script, the ability to extend C programs with a scripting language that is respectful of the native language of the (mostly-non programmer) user etc... > gets 1000x better, but we're not there yet -- try translate.google.com > if you don't believe me.) I hope you get bonus points at work for mentioning this one. Believe it or not, translate.google.com is my friend! > You're stretching my words there. The issue if translation hadn't Clearly you could not think of this issue, but I am not stretching your word. I was just reusing some of the *strong* points you made why you thought Python was such a great invention of yours (and don't get me wrong, we all love it!). I was just applying those great points to this new issue which I believe fully deserve more attention. > crossed my mind when I wrote that (over 10 years ago) and the tools > *really* weren't ready then. And regarding readability, if all the The tools are ready now. We live in a mostly fully unicode world now, and we just agreed in another PEP that the default source encoding of files will be UTF8... > programmers in the world agreed to use broken English, the readability > of their code to each other would be much better dan als we allemaal > in onze eigen taal schreven. The funny thing is that I can read this sentence very well: my life was spent surrounded by latin characters. I can even probably understand it as I can speak some German too. allesmaal -> Jedesmal -> always onze -> eine -> its eigen -> eigen -> own taal -> sprache -> language schreven -> schreiben -> write My cultural background can help me decipher VERY QUICKLY what you wrote. But think of the 7 years old Japanese child. They are not taught latin characters really before they will seriously learn English... but this is the year I started programming (by copying french listing of programs for Thomson TO7-70 computers... oh my god!). Regards, Guillaume From guido at python.org Mon May 14 03:51:40 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 13 May 2007 18:51:40 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705131809i11cd0b82pfc226bd81dd16e99@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> <19dd68ba0705131617w4bbc45f3p833159a20552e909@mail.gmail.com> <19dd68ba0705131809i11cd0b82pfc226bd81dd16e99@mail.gmail.com> Message-ID: I respectfully disagree with the conclusion you draw from the same data. I don't think either of us can say anything that will satisfy the other. --Guido On 5/13/07, Guillaume Proux wrote: > Hello, > > > > Interestingly, this is *not* a well known fact. I have asked 2 > > > friend-of-mine seasoned Java programmers and they were *amazed* that > > > this is supported. > > Well, maybe we should add it to Python as a secret feature. :-) :-) :-) > > But they also said that: > 1) they wish they would have known earlier... > 2) would start using this immediatly for their own small projects > > > > see e.g. http://lists.xml.org/archives/xml-dev/200107/msg00254.html > > I imagine the situation there is sufficiently different though; XML is > > data, not code. > > I wish you had enough time to read some of the posts linked from the > above URL. In particular, you can see the viewpoint of some Japanese > people on the ability for them to describe data structures (which is > really a programming concept) in their own words. > > > > I realize you've added a smiley, but please, don't propose new > > features for a release that's already been released. The release > > managers will put you in jail and not let you out until 4.0 has been > > released. :-) > > eheheheh :) > > > Because most people still use systems that have very inadequate tools > > for handling non-ASCII text, especially non-Latin-1 text. For example, > > at work I use Ubuntu, a modern Linux distribution actively supported > > by a company headquartered in South-Africa. Their main market lies > > outside Europe and North America. And yet, there is no standard way to > > enter non-ASCII characters as basic as c-cedilla or u-umlaut; the main > > I also use Ubuntu at home. > Regarding your issue: hum? you can change keyboard layout (I even > think it does affect the current input system immediatly). Also there > is a number of tools like gucharmap > (http://gucharmap.sourceforge.net/shots/shot-003.png) that enables you > to copy paste rare characters. > > > tools I use (Emacs, Firefox and bash running in a terminal emulator) > > all have different input methods, different ideas of the default > > character encoding, and so on. It's a crapshoot whether > > copy-and-pasting even the simplest non-ASCII text (like the name of > > PEP 3131's author :-) between any two of these will work. > > Ubuntu Feisty (and I think Edgy too) default on UTF8 everywhere and I > have never had any issue using French, Japanese and English anywhere. > Windows came to this maturity point about 5-6 years ago. > > > I see program code as a tool for communication between people. Note > > how you & I are using English in this thread even though it is not the > > mother tongue for either of us. So we use English, since we can both > > read and write it reasonably well. This is the *only* way that > > programmers raised in different countries can exchange code at all. > > I *totally* agree with you, you sometimes need to go down to the > lowest common denominator (with tongue in cheek)... But I still do not > understand that you are not happy to see people become more productive > with Python when there is no need of international exchange: the small > (or large) internal application, the throw-away script, the ability > to extend C programs with a scripting language that is respectful of > the native language of the (mostly-non programmer) user etc... > > > gets 1000x better, but we're not there yet -- try translate.google.com > > if you don't believe me.) > > I hope you get bonus points at work for mentioning this one. Believe > it or not, translate.google.com is my friend! > > > > You're stretching my words there. The issue if translation hadn't > > Clearly you could not think of this issue, but I am not stretching > your word. I was just reusing some of the *strong* points you made why > you thought Python was such a great invention of yours (and don't get > me wrong, we all love it!). I was just applying those great points to > this new issue which I believe fully deserve more attention. > > > crossed my mind when I wrote that (over 10 years ago) and the tools > > *really* weren't ready then. And regarding readability, if all the > > The tools are ready now. We live in a mostly fully unicode world now, > and we just agreed in another PEP that the default source encoding of > files will be UTF8... > > > programmers in the world agreed to use broken English, the readability > > of their code to each other would be much better dan als we allemaal > > in onze eigen taal schreven. > > The funny thing is that I can read this sentence very well: my life > was spent surrounded by latin characters. I can even probably > understand it as I can speak some German too. > allesmaal -> Jedesmal -> always > onze -> eine -> its > eigen -> eigen -> own > taal -> sprache -> language > schreven -> schreiben -> write > > My cultural background can help me decipher VERY QUICKLY what you > wrote. But think of the 7 years old Japanese child. They are not > taught latin characters really before they will seriously learn > English... but this is the year I started programming (by copying > french listing of programs for Thomson TO7-70 computers... oh my > god!). > > Regards, > > Guillaume > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From murman at gmail.com Mon May 14 05:03:26 2007 From: murman at gmail.com (Michael Urman) Date: Sun, 13 May 2007 22:03:26 -0500 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: References: Message-ID: On 5/13/07, Guido van Rossum wrote: > The answer to all of this is the filesystem encoding, which is already > supported. Doesn't appear particularly difficult to me. Okay, that's fair. It seems reasonable to accept the limitations of following the filesystem encoding for module names. I should probably test py3k to make sure it already has updated __import__ to use the filesystem encoding instead of the default encoding, but instead I'll just feebly imply the question here. Further thoughts related to this lead me to ask if there is to be only the version of open() which takes a unicode string, of if there will also be the opportunity to pass a byte string which doesn't pass through the encoding. It's far too common for Linux users to have files named with different encodings than their environment suggests. If it's only possible to open files whose names can be decoded via the filesystem encoding, I foresee several unhappy end-user experiences. -- Michael Urman From martin at v.loewis.de Mon May 14 07:12:57 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 14 May 2007 07:12:57 +0200 Subject: [Python-3000] PEP 3123 (Was: PEP Parade) In-Reply-To: References: <464720E8.3040402@v.loewis.de> Message-ID: <4647EFD9.9010001@v.loewis.de> Guido van Rossum schrieb: > I'm okay with applying to 2.6 and then merging into 3.0. ISTM though > that backporting this to 2.5 would cause the release manager to throw > a fit, so I think that's not worth it. What would be the benefit > anyway? I think you misunderstood. If this is applied to 2.6, then (independent) bug fixes that get applied to 2.6 will be more difficult to backport, if the change falls into a region where ->ob_type was used, as the patch might fail to apply. Regards, Martin From martin at v.loewis.de Mon May 14 07:24:59 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 14 May 2007 07:24:59 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4647B15F.7040700@canterbury.ac.nz> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <4647B15F.7040700@canterbury.ac.nz> Message-ID: <4647F2AB.3060406@v.loewis.de> > I don't think this scenario is all that unlikely. A > program is initially written by a Russian programmer > who uses his own version of "a" as a variable name. > Later an English-speaking programmer makes some > changes, and uses an ascii "a". Now there are two > subtly different variables called "a" in different > parts of the program. If they work in the same project, they will have a coding style that says "ASCII-only identifiers". Also, if the change is in different parts of the program, there won't be a common variable called "a". When was the last time you called a variable 'a'? I hope it was a local variable; if you use 'a' for class or method names, or global variables, you have bigger problems than typographical ones. Regards, Martin From hanser at club-internet.fr Mon May 14 07:33:12 2007 From: hanser at club-internet.fr (Pierre Hanser) Date: Mon, 14 May 2007 07:33:12 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4647AE04.1040207@canterbury.ac.nz> References: <4646B6A1.7060007@club-internet.fr> <4647AE04.1040207@canterbury.ac.nz> Message-ID: <4647F498.90702@club-internet.fr> Greg Ewing a ?crit : > Pierre Hanser wrote: > >> In english, most of the time, adding 'ed' to the verb will do >> the difference: change -> changed >> >> in french: change -> chang? (ends with 'eacute') > > Fine if the reader understands French, but if you > later want to translate this program so that a > non-French speaker can read it, what would you > do? i will translate it, but i don't want to have to speak english for my personnal homework. That's all. And that should be enough. currently, at home, my choice is poor degenerated french, or english. What a choice! - Pierre From collinw at gmail.com Mon May 14 07:36:55 2007 From: collinw at gmail.com (Collin Winter) Date: Sun, 13 May 2007 22:36:55 -0700 Subject: [Python-3000] PEP 3133: Introducing Roles Message-ID: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> PEP: 3133 Title: Introducing Roles Version: $Revision$ Last-Modified: $Date$ Author: Collin Winter Status: Draft Type: Standards Track Requires: 3115, 3129 Content-Type: text/x-rst Created: 1-May-2007 Python-Version: 3.0 Post-History: 13-May-2007 Abstract ======== Python's existing object model organizes objects according to their implementation. It is often desirable -- especially in duck typing-based language like Python -- to organize objects by the part they play in a larger system (their intent), rather than by how they fulfill that part (their implementation). This PEP introduces the concept of roles, a mechanism for organizing objects according to their intent rather than their implementation. Rationale ========= In the beginning were objects. They allowed programmers to marry function and state, and to increase code reusability through concepts like polymorphism and inheritance, and lo, it was good. There came a time, however, when inheritance and polymorphism weren't enough. With the invention of both dogs and trees, we were no longer able to be content with knowing merely, "Does it understand 'bark'?" We now needed to know what a given object thought that "bark" meant. One solution, the one detailed here, is that of roles, a mechanism orthogonal and complementary to the traditional class/instance system. Whereas classes concern themselves with state and implementation, the roles mechanism deals exclusively with the behaviours embodied in a given class. This system was originally called "traits" and implemented for Squeak Smalltalk [#traits-paper]_. It has since been adapted for use in Perl 6 [#perl6-s12]_ where it is called "roles", and it is primarily from there that the concept is now being interpreted for Python 3. Python 3 will preserve the name "roles". In a nutshell: roles tell you *what* an object does, classes tell you *how* an object does it. In this PEP, I will outline a system for Python 3 that will make it possible to easily determine whether a given object's understanding of "bark" is tree-like or dog-like. (There might also be more serious examples.) A Note on Syntax ---------------- A syntax proposals in this PEP are tentative and should be considered to be strawmen. The necessary bits that this PEP depends on -- namely PEP 3115's class definition syntax and PEP 3129's class decorators -- are still being formalized and may change. Function names will, of course, be subject to lengthy bikeshedding debates. Performing Your Role ==================== Static Role Assignment ---------------------- Let's start out by defining ``Tree`` and ``Dog`` classes :: class Tree(Vegetable): def bark(self): return self.is_rough() class Dog(Animal): def bark(self): return self.goes_ruff() While both implement a ``bark()`` method with the same signature, they do wildly different things. We need some way of differentiating what we're expecting. Relying on inheritance and a simple ``isinstance()`` test will limit code reuse and/or force any dog-like classes to inherit from ``Dog``, whether or not that makes sense. Let's see if roles can help. :: @perform_role(Doglike) class Dog(Animal): ... @perform_role(Treelike) class Tree(Vegetable): ... @perform_role(SitThere) class Rock(Mineral): ... We use class decorators from PEP 3129 to associate a particular role or roles with a class. Client code can now verify that an incoming object performs the ``Doglike`` role, allowing it to handle ``Wolf``, ``LaughingHyena`` and ``Aibo`` [#aibo]_ instances, too. Roles can be composed via normal inheritance: :: @perform_role(Guard, MummysLittleDarling) class GermanShepherd(Dog): def guard(self, the_precious): while True: if intruder_near(the_precious): self.growl() def get_petted(self): self.swallow_pride() Here, ``GermanShepherd`` instances perform three roles: ``Guard`` and ``MummysLittleDarling`` are applied directly, whereas ``Doglike`` is inherited from ``Dog``. Assigning Roles at Runtime -------------------------- Roles can be assigned at runtime, too, by unpacking the syntactic sugar provided by decorators. Say we import a ``Robot`` class from another module, and since we know that ``Robot`` already implements our ``Guard`` interface, we'd like it to play nicely with guard-related code, too. :: >>> perform(Guard)(Robot) This takes effect immediately and impacts all instances of ``Robot``. Asking Questions About Roles ---------------------------- Just because we've told our robot army that they're guards, we'd like to check in on them occasionally and make sure they're still at their task. :: >>> performs(our_robot, Guard) True What about that one robot over there? :: >>> performs(that_robot_over_there, Guard) True The ``performs()`` function is used to ask if a given object fulfills a given role. It cannot be used, however, to ask a class if its instances fulfill a role: :: >>> performs(Robot, Guard) False This is because the ``Robot`` class is not interchangeable with a ``Robot`` instance. Defining New Roles ================== Empty Roles ----------- Roles are defined like a normal class, but use the ``Role`` metaclass. :: class Doglike(metaclass=Role): ... Metaclasses are used to indicate that ``Doglike`` is a ``Role`` in the same way 5 is an ``int`` and ``tuple`` is a ``type``. Composing Roles via Inheritance ------------------------------- Roles may inherit from other roles; this has the effect of composing them. Here, instances of ``Dog`` will perform both the ``Doglike`` and ``FourLegs`` roles. :: class FourLegs(metaclass=Role): pass class Doglike(FourLegs, Carnivor): pass @perform_role(Doglike) class Dog(Mammal): pass Requiring Concrete Methods -------------------------- So far we've only defined empty roles -- not very useful things. Let's now require that all classes that claim to fulfill the ``Doglike`` role define a ``bark()`` method: :: class Doglike(FourLegs): def bark(self): pass No decorators are required to flag the method as "abstract", and the method will never be called, meaning whatever code it contains (if any) is irrelevant. Roles provide *only* abstract methods; concrete default implementations are left to other, better-suited mechanisms like mixins. Once you have defined a role, and a class has claimed to perform that role, it is essential that that claim be verified. Here, the programmer has misspelled one of the methods required by the role. :: @perform_role(FourLegs) class Horse(Mammal): def run_like_teh_wind(self) ... This will cause the role system to raise an exception, complaining that you're missing a ``run_like_the_wind()`` method. The role system carries out these checks as soon as a class is flagged as performing a given role. Concrete methods are required to match exactly the signature demanded by the role. Here, we've attempted to fulfill our role by defining a concrete version of ``bark()``, but we've missed the mark a bit. :: @perform_role(Doglike) class Coyote(Mammal): def bark(self, target=moon): pass This method's signature doesn't match exactly with what the ``Doglike`` role was expecting, so the role system will throw a bit of a tantrum. Mechanism ========= The following are strawman proposals for how roles might be expressed in Python. The examples here are phrased in a way that the roles mechanism may be implemented without changing the Python interpreter. (Examples adapted from an article on Perl 6 roles by Curtis Poe [#roles-examples]_.) 1. Static class role assignment :: @perform_role(Thieving) class Elf(Character): ... ``perform_role()`` accepts multiple arguments, such that this is also legal: :: @perform_role(Thieving, Spying, Archer) class Elf(Character): ... The ``Elf`` class now performs both the ``Thieving``, ``Spying``, and ``Archer`` roles. 2. Querying instances :: if performs(my_elf, Thieving): ... The second argument to ``performs()`` may also be anything with a ``__contains__()`` method, meaning the following is legal: :: if performs(my_elf, set([Thieving, Spying, BoyScout])): ... Like ``isinstance()``, the object needs only to perform a single role out of the set in order for the expression to be true. Relationship to Abstract Base Classes ===================================== Early drafts of this PEP [#proposal]_ envisioned roles as competing with the abstract base classes proposed in PEP 3119. After further discussion and deliberation, a compromise and a delegation of responsibilities and use-cases has been worked out as follows: * Roles provide a way of indicating a object's semantics and abstract capabilities. A role may define abstract methods, but only as a way of delineating an interface through which a particular set of semantics are accessed. An ``Ordering`` role might require that some set of ordering operators be defined. :: class Ordering(metaclass=Role): def __ge__(self, other): pass def __le__(self, other): pass def __ne__(self, other): pass # ...and so on In this way, we're able to indicate an object's role or function within a larger system without constraining or concerning ourselves with a particular implementation. * Abstract base classes, by contrast, are a way of reusing common, discrete units of implementation. For example, one might define an ``OrderingMixin`` that implements several ordering operators in terms of other operators. :: class OrderingMixin: def __ge__(self, other): return self > other or self == other def __le__(self, other): return self < other or self == other def __ne__(self, other): return not self == other # ...and so on Using this abstract base class - more properly, a concrete mixin - allows a programmer to define a limited set of operators and let the mixin in effect "derive" the others. By combining these two orthogonal systems, we're able to both a) provide functionality, and b) alert consumer systems to the presence and availability of this functionality. For example, since the ``OrderingMixin`` class above satisfies the interface and semantics expressed in the ``Ordering`` role, we say the mixin performs the role: :: @perform_role(Ordering) class OrderingMixin: def __ge__(self, other): return self > other or self == other def __le__(self, other): return self < other or self == other def __ne__(self, other): return not self == other # ...and so on Now, any class that uses the mixin will automatically -- that is, without further programmer effort -- be tagged as performing the ``Ordering`` role. The separation of concerns into two distinct, orthogonal systems is desirable because it allows us to use each one separately. Take, for example, a third-party package providing a ``RecursiveHash`` role that indicates a container takes its contents into account when determining its hash value. Since Python's built-in ``tuple`` and ``frozenset`` classes follow this semantic, the ``RecursiveHash`` role can be applied to them. :: >>> perform_role(RecursiveHash)(tuple) >>> perform_role(RecursiveHash)(frozenset) Now, any code that consumes ``RecursiveHash`` objects will now be able to consume tuples and frozensets. Open Issues =========== Allowing Instances to Perform Different Roles Than Their Class -------------------------------------------------------------- Perl 6 allows instances to perform different roles than their class. These changes are local to the single instance and do not affect other instances of the class. For example: :: my_elf = Elf() my_elf.goes_on_quest() my_elf.becomes_evil() now_performs(my_elf, Thieving) # Only this one elf is a thief my_elf.steals(["purses", "candy", "kisses"]) In Perl 6, this is done by creating an anonymous class that inherits from the instance's original parent and performs the additional role(s). This is possible in Python 3, though whether it is desirable is still is another matter. Inclusion of this feature would, of course, make it much easier to express the works of Charles Dickens in Python: :: >>> from literature import role, BildungsRoman >>> from dickens import Urchin, Gentleman >>> >>> with BildungsRoman() as OliverTwist: ... mr_brownlow = Gentleman() ... oliver, artful_dodger = Urchin(), Urchin() ... now_performs(artful_dodger, [role.Thief, role.Scoundrel]) ... ... oliver.has_adventures_with(ArtfulDodger) ... mr_brownlow.adopt_orphan(oliver) ... now_performs(oliver, role.RichWard) Requiring Attributes -------------------- Neal Norwitz has requested the ability to make assertions about the presence of attributes using the same mechanism used to require methods. Since roles take effect at class definition-time, and since the vast majority of attributes are defined at runtime by a class's ``__init__()`` method, there doesn't seem to be a good way to check for attributes at the same time as methods. It may still be desirable to include non-enforced attributes in the role definition, if only for documentation purposes. Roles of Roles -------------- Under the proposed semantics, it is possible for roles to have roles of their own. :: @perform_role(Y) class X(metaclass=Role): ... While this is possible, it is meaningless, since roles are generally not instantiated. There has been some off-line discussion about giving meaning to this expression, but so far no good ideas have emerged. class_performs() ---------------- It is currently not possible to ask a class if its instances perform a given role. It may be desirable to provide an analogue to ``performs()`` such that :: >>> isinstance(my_dwarf, Dwarf) True >>> performs(my_dwarf, Surly) True >>> performs(Dwarf, Surly) False >>> class_performs(Dwarf, Surly) True Prettier Dynamic Role Assignment -------------------------------- An early draft of this PEP included a separate mechanism for dynamically assigning a role to a class. This was spelled :: >>> now_perform(Dwarf, GoldMiner) This same functionality already exists by unpacking the syntactic sugar provided by decorators: :: >>> perform_role(GoldMiner)(Dwarf) At issue is whether dynamic role assignment is sufficiently important to warrant a dedicated spelling. Syntax Support -------------- Though the phrasings laid out in this PEP are designed so that the roles system could be shipped as a stand-alone package, it may be desirable to add special syntax for defining, assigning and querying roles. One example might be a role keyword, which would translate :: class MyRole(metaclass=Role): ... into :: role MyRole: ... Assigning a role could take advantage of the class definition arguments proposed in PEP 3115: :: class MyClass(performs=MyRole): ... Implementation ============== A reference implementation is forthcoming. Acknowledgements ================ Thanks to Jeffery Yasskin, Talin and Guido van Rossum for several hours of in-person discussion to iron out the differences, overlap and finer points of roles and abstract base classes. References ========== .. [#aibo] http://en.wikipedia.org/wiki/AIBO .. [#roles-examples] http://www.perlmonks.org/?node_id=384858 .. [#perl6-s12] http://dev.perl.org/perl6/doc/design/syn/S12.html .. [#traits-paper] http://www.iam.unibe.ch/~scg/Archive/Papers/Scha03aTraits.pdf .. [#proposal] http://mail.python.org/pipermail/python-3000/2007-April/007026.html Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From steven.bethard at gmail.com Mon May 14 08:08:08 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 14 May 2007 00:08:08 -0600 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> Message-ID: On 5/13/07, Collin Winter wrote: > PEP: 3133 > Title: Introducing Roles [snip] > * Roles provide a way of indicating a object's semantics and abstract > capabilities. A role may define abstract methods, but only as a > way of delineating an interface through which a particular set of > semantics are accessed. [snip] > * Abstract base classes, by contrast, are a way of reusing common, > discrete units of implementation. [snip] > Using this abstract base class - more properly, a concrete > mixin - allows a programmer to define a limited set of operators > and let the mixin in effect "derive" the others. So what's the difference between a role and an abstract base class that used @abstractmethod on all of its methods? Isn't such an ABC just "delineating an interface"? > since the ``OrderingMixin`` class above satisfies the interface > and semantics expressed in the ``Ordering`` role, we say the mixin > performs the role: :: > > @perform_role(Ordering) > class OrderingMixin: > def __ge__(self, other): > return self > other or self == other > > def __le__(self, other): > return self < other or self == other > > def __ne__(self, other): > return not self == other > > # ...and so on > > Now, any class that uses the mixin will automatically -- that is, > without further programmer effort -- be tagged as performing the > ``Ordering`` role. But why is:: performs(obj, Ordering) any better than:: isinstance(obj, Ordering) if Ordering is just an appropriately registered ABC? (BTW, Ordering is a bad example since the ABC PEP no longer proposes that. Maybe Sequence or Mapping instead?) STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From tcdelaney at optusnet.com.au Mon May 14 09:23:52 2007 From: tcdelaney at optusnet.com.au (Tim Delaney) Date: Mon, 14 May 2007 17:23:52 +1000 Subject: [Python-3000] PEP 367: New Super Message-ID: <003001c795f8$d5275060$0201a8c0@mshome.net> Here is my modified version of PEP 367. The reference implementation in it is pretty long, and should probably be split out to somewhere else (esp. since it can't fully implement the semantics). Cheers, Tim Delaney PEP: 367 Title: New Super Version: $Revision$ Last-Modified: $Date$ Author: Calvin Spealman Author: Tim Delaney Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 28-Apr-2007 Python-Version: 2.6 Post-History: 28-Apr-2007, 29-Apr-2007 (1), 29-Apr-2007 (2), 14-May-2007 Abstract ======== This PEP proposes syntactic sugar for use of the ``super`` type to automatically construct instances of the super type binding to the class that a method was defined in, and the instance (or class object for classmethods) that the method is currently acting upon. The premise of the new super usage suggested is as follows:: super.foo(1, 2) to replace the old:: super(Foo, self).foo(1, 2) and the current ``__builtin__.super`` be aliased to ``__builtin__.__super__`` (with ``__builtin__.super`` to be removed in Python 3.0). It is further proposed that assignment to ``super`` become a ``SyntaxError``, similar to the behaviour of ``None``. Rationale ========= The current usage of super requires an explicit passing of both the class and instance it must operate from, requiring a breaking of the DRY (Don't Repeat Yourself) rule. This hinders any change in class name, and is often considered a wart by many. Specification ============= Within the specification section, some special terminology will be used to distinguish similar and closely related concepts. "super type" will refer to the actual builtin type named "super". A "super instance" is simply an instance of the super type, which is associated with a class and possibly with an instance of that class. Because the new ``super`` semantics are not backwards compatible with Python 2.5, the new semantics will require a ``__future__`` import:: from __future__ import new_super The current ``__builtin__.super`` will be aliased to ``__builtin__.__super__``. This will occur regardless of whether the new ``super`` semantics are active. It is not possible to simply rename ``__builtin__.super``, as that would affect modules that do not use the new ``super`` semantics. In Python 3.0 it is proposed that the name ``__builtin__.super`` will be removed. Replacing the old usage of super, calls to the next class in the MRO (method resolution order) can be made without explicitly creating a ``super`` instance (although doing so will still be supported via ``__super__``). Every function will have an implicit local named ``super``. This name behaves identically to a normal local, including use by inner functions via a cell, with the following exceptions: 1. Assigning to the name ``super`` will raise a ``SyntaxError`` at compile time; 2. Calling a static method or normal function that accesses the name ``super`` will raise a ``TypeError`` at runtime. Every function that uses the name ``super``, or has an inner function that uses the name ``super``, will include a preamble that performs the equivalent of:: super = __builtin__.__super__(, ) where ```` is the class that the method was defined in, and ```` is the first parameter of the method (normally ``self`` for instance methods, and ``cls`` for class methods). For static methods and normal functions, ```` will be ``None``, resulting in a ``TypeError`` being raised during the preamble. Note: The relationship between ``super`` and ``__super__`` is similar to that between ``import`` and ``__import__``. Much of this was discussed in the thread of the python-dev list, "Fixing super anyone?" [1]_. Open Issues ----------- Determining the class object to use ''''''''''''''''''''''''''''''''''' The exact mechanism for associating the method with the defining class is not specified in this PEP, and should be chosen for maximum performance. For CPython, it is suggested that the class instance be held in a C-level variable on the function object which is bound to one of ``NULL`` (not part of a class), ``Py_None`` (static method) or a class object (instance or class method). Should ``super`` actually become a keyword? ''''''''''''''''''''''''''''''''''''''''''' With this proposal, ``super`` would become a keyword to the same extent that ``None`` is a keyword. It is possible that further restricting the ``super`` name may simplify implementation, however some are against the actual keyword- ization of super. The simplest solution is often the correct solution and the simplest solution may well not be adding additional keywords to the language when they are not needed. Still, it may solve other open issues. Closed Issues ------------- super used with __call__ attributes ''''''''''''''''''''''''''''''''''' It was considered that it might be a problem that instantiating super instances the classic way, because calling it would lookup the __call__ attribute and thus try to perform an automatic super lookup to the next class in the MRO. However, this was found to be false, because calling an object only looks up the __call__ method directly on the object's type. The following example shows this in action. :: class A(object): def __call__(self): return '__call__' def __getattribute__(self, attr): if attr == '__call__': return lambda: '__getattribute__' a = A() assert a() == '__call__' assert a.__call__() == '__getattribute__' In any case, with the renaming of ``__builtin__.super`` to ``__builtin__.__super__`` this issue goes away entirely. Reference Implementation ======================== It is impossible to implement the above specification entirely in Python. This reference implementation has the following differences to the specification: 1. New ``super`` semantics are implemented using bytecode hacking. 2. Assignment to ``super`` is not a ``SyntaxError``. Also see point #4. 3. Classes must either use the metaclass ``autosuper_meta`` or inherit from the base class ``autosuper`` to acquire the new ``super`` semantics. 4. ``super`` is not an implicit local variable. In particular, for inner functions to be able to use the super instance, there must be an assignment of the form ``super = super`` in the method. The reference implementation assumes that it is being run on Python 2.5+. :: #!/usr/bin/env python # # autosuper.py from array import array import dis import new import types import __builtin__ __builtin__.__super__ = __builtin__.super del __builtin__.super # We need these for modifying bytecode from opcode import opmap, HAVE_ARGUMENT, EXTENDED_ARG LOAD_GLOBAL = opmap['LOAD_GLOBAL'] LOAD_NAME = opmap['LOAD_NAME'] LOAD_CONST = opmap['LOAD_CONST'] LOAD_FAST = opmap['LOAD_FAST'] LOAD_ATTR = opmap['LOAD_ATTR'] STORE_FAST = opmap['STORE_FAST'] LOAD_DEREF = opmap['LOAD_DEREF'] STORE_DEREF = opmap['STORE_DEREF'] CALL_FUNCTION = opmap['CALL_FUNCTION'] STORE_GLOBAL = opmap['STORE_GLOBAL'] DUP_TOP = opmap['DUP_TOP'] POP_TOP = opmap['POP_TOP'] NOP = opmap['NOP'] JUMP_FORWARD = opmap['JUMP_FORWARD'] ABSOLUTE_TARGET = dis.hasjabs def _oparg(code, opcode_pos): return code[opcode_pos+1] + (code[opcode_pos+2] << 8) def _bind_autosuper(func, cls): co = func.func_code name = func.func_name newcode = array('B', co.co_code) codelen = len(newcode) newconsts = list(co.co_consts) newvarnames = list(co.co_varnames) # Check if the global 'super' keyword is already present try: sn_pos = list(co.co_names).index('super') except ValueError: sn_pos = None # Check if the varname 'super' keyword is already present try: sv_pos = newvarnames.index('super') except ValueError: sv_pos = None # Check if the callvar 'super' keyword is already present try: sc_pos = list(co.co_cellvars).index('super') except ValueError: sc_pos = None # If 'super' isn't used anywhere in the function, we don't have anything to do if sn_pos is None and sv_pos is None and sc_pos is None: return func c_pos = None s_pos = None n_pos = None # Check if the 'cls_name' and 'super' objects are already in the constants for pos, o in enumerate(newconsts): if o is cls: c_pos = pos if o is __super__: s_pos = pos if o == name: n_pos = pos # Add in any missing objects to constants and varnames if c_pos is None: c_pos = len(newconsts) newconsts.append(cls) if n_pos is None: n_pos = len(newconsts) newconsts.append(name) if s_pos is None: s_pos = len(newconsts) newconsts.append(__super__) if sv_pos is None: sv_pos = len(newvarnames) newvarnames.append('super') # This goes at the start of the function. It is: # # super = __super__(cls, self) # # If 'super' is a cell variable, we store to both the # local and cell variables (i.e. STORE_FAST and STORE_DEREF). # preamble = [ LOAD_CONST, s_pos & 0xFF, s_pos >> 8, LOAD_CONST, c_pos & 0xFF, c_pos >> 8, LOAD_FAST, 0, 0, CALL_FUNCTION, 2, 0, ] if sc_pos is None: # 'super' is not a cell variable - we can just use the local variable preamble += [ STORE_FAST, sv_pos & 0xFF, sv_pos >> 8, ] else: # If 'super' is a cell variable, we need to handle LOAD_DEREF. preamble += [ DUP_TOP, STORE_FAST, sv_pos & 0xFF, sv_pos >> 8, STORE_DEREF, sc_pos & 0xFF, sc_pos >> 8, ] preamble = array('B', preamble) # Bytecode for loading the local 'super' variable. load_super = array('B', [ LOAD_FAST, sv_pos & 0xFF, sv_pos >> 8, ]) preamble_len = len(preamble) need_preamble = False i = 0 while i < codelen: opcode = newcode[i] need_load = False remove_store = False if opcode == EXTENDED_ARG: raise TypeError("Cannot use 'super' in function with EXTENDED_ARG opcode") # If the opcode is an absolute target it needs to be adjusted # to take into account the preamble. elif opcode in ABSOLUTE_TARGET: oparg = _oparg(newcode, i) + preamble_len newcode[i+1] = oparg & 0xFF newcode[i+2] = oparg >> 8 # If LOAD_GLOBAL(super) or LOAD_NAME(super) then we want to change it into # LOAD_FAST(super) elif (opcode == LOAD_GLOBAL or opcode == LOAD_NAME) and _oparg(newcode, i) == sn_pos: need_preamble = need_load = True # If LOAD_FAST(super) then we just need to add the preamble elif opcode == LOAD_FAST and _oparg(newcode, i) == sv_pos: need_preamble = need_load = True # If LOAD_DEREF(super) then we change it into LOAD_FAST(super) because # it's slightly faster. elif opcode == LOAD_DEREF and _oparg(newcode, i) == sc_pos: need_preamble = need_load = True if need_load: newcode[i:i+3] = load_super i += 1 if opcode >= HAVE_ARGUMENT: i += 2 # No changes needed - get out. if not need_preamble: return func # Our preamble will have 3 things on the stack co_stacksize = max(3, co.co_stacksize) # Conceptually, our preamble is on the `def` line. co_lnotab = array('B', co.co_lnotab) if co_lnotab: co_lnotab[0] += preamble_len co_lnotab = co_lnotab.tostring() # Our code consists of the preamble and the modified code. codestr = (preamble + newcode).tostring() codeobj = new.code(co.co_argcount, len(newvarnames), co_stacksize, co.co_flags, codestr, tuple(newconsts), co.co_names, tuple(newvarnames), co.co_filename, co.co_name, co.co_firstlineno, co_lnotab, co.co_freevars, co.co_cellvars) func.func_code = codeobj func.func_class = cls return func class autosuper_meta(type): def __init__(cls, name, bases, clsdict): UnboundMethodType = types.UnboundMethodType for v in vars(cls): o = getattr(cls, v) if isinstance(o, UnboundMethodType): _bind_autosuper(o.im_func, cls) class autosuper(object): __metaclass__ = autosuper_meta if __name__ == '__main__': class A(autosuper): def f(self): return 'A' class B(A): def f(self): return 'B' + super.f() class C(A): def f(self): def inner(): return 'C' + super.f() # Needed to put 'super' into a cell super = super return inner() class D(B, C): def f(self, arg=None): var = None return 'D' + super.f() assert D().f() == 'DBCA' Disassembly of B.f and C.f reveals the different preambles used when ``super`` is simply a local variable compared to when it is used by an inner function. :: >>> dis.dis(B.f) 214 0 LOAD_CONST 4 () 3 LOAD_CONST 2 () 6 LOAD_FAST 0 (self) 9 CALL_FUNCTION 2 12 STORE_FAST 1 (super) 215 15 LOAD_CONST 1 ('B') 18 LOAD_FAST 1 (super) 21 LOAD_ATTR 1 (f) 24 CALL_FUNCTION 0 27 BINARY_ADD 28 RETURN_VALUE :: >>> dis.dis(C.f) 218 0 LOAD_CONST 4 () 3 LOAD_CONST 2 () 6 LOAD_FAST 0 (self) 9 CALL_FUNCTION 2 12 DUP_TOP 13 STORE_FAST 1 (super) 16 STORE_DEREF 0 (super) 219 19 LOAD_CLOSURE 0 (super) 22 LOAD_CONST 1 () 25 MAKE_CLOSURE 0 28 STORE_FAST 2 (inner) 223 31 LOAD_FAST 1 (super) 34 STORE_DEREF 0 (super) 224 37 LOAD_FAST 2 (inner) 40 CALL_FUNCTION 0 43 RETURN_VALUE Note that in the final implementation, the preamble would not be part of the bytecode of the method, but would occur immediately following unpacking of parameters. Alternative Proposals ===================== No Changes ---------- Although its always attractive to just keep things how they are, people have sought a change in the usage of super calling for some time, and for good reason, all mentioned previously. - Decoupling from the class name (which might not even be bound to the right class anymore!) - Simpler looking, cleaner super calls would be better Dynamic attribute on super type ------------------------------- The proposal adds a dynamic attribute lookup to the super type, which will automatically determine the proper class and instance parameters. Each super attribute lookup identifies these parameters and performs the super lookup on the instance, as the current super implementation does with the explicit invokation of a super instance upon a class and instance. This proposal relies on sys._getframe(), which is not appropriate for anything except a prototype implementation. super(__this_class__, self) --------------------------- This is nearly an anti-proposal, as it basically relies on the acceptance of the __this_class__ PEP, which proposes a special name that would always be bound to the class within which it is used. If that is accepted, __this_class__ could simply be used instead of the class' name explicitly, solving the name binding issues [2]_. self.__super__.foo(\*args) -------------------------- The __super__ attribute is mentioned in this PEP in several places, and could be a candidate for the complete solution, actually using it explicitly instead of any super usage directly. However, double-underscore names are usually an internal detail, and attempted to be kept out of everyday code. super(self, \*args) or __super__(self, \*args) ---------------------------------------------- This solution only solves the problem of the type indication, does not handle differently named super methods, and is explicit about the name of the instance. It is less flexable without being able to enacted on other method names, in cases where that is needed. One use case this fails is where a base- class has a factory classmethod and a subclass has two factory classmethods, both of which needing to properly make super calls to the one in the base- class. super.foo(self, \*args) ----------------------- This variation actually eliminates the problems with locating the proper instance, and if any of the alternatives were pushed into the spotlight, I would want it to be this one. super or super() ---------------- This proposal leaves no room for different names, signatures, or application to other classes, or instances. A way to allow some similar use alongside the normal proposal would be favorable, encouraging good design of multiple inheritence trees and compatible methods. super(\*p, \*\*kw) ------------------ There has been the proposal that directly calling ``super(*p, **kw)`` would be equivalent to calling the method on the ``super`` object with the same name as the method currently being executed i.e. the following two methods would be equivalent: :: def f(self, *p, **kw): super.f(*p, **kw) :: def f(self, *p, **kw): super(*p, **kw) There is strong sentiment for and against this, but implementation and style concerns are obvious. Guido has suggested that this should be excluded from this PEP on the principle of KISS (Keep It Simple Stupid). History ======= 29-Apr-2007 - Changed title from "Super As A Keyword" to "New Super" - Updated much of the language and added a terminology section for clarification in confusing places. - Added reference implementation and history sections. 06-May-2007 - Updated by Tim Delaney to reflect discussions on the python-3000 and python-dev mailing lists. References ========== .. [1] Fixing super anyone? (http://mail.python.org/pipermail/python-3000/2007-April/006667.html) .. [2] PEP 3130: Access to Module/Class/Function Currently Being Defined (this) (http://mail.python.org/pipermail/python-ideas/2007-April/000542.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From talin at acm.org Mon May 14 09:48:23 2007 From: talin at acm.org (Talin) Date: Mon, 14 May 2007 00:48:23 -0700 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> Message-ID: <46481447.7010100@acm.org> Collin Winter wrote: > PEP: 3133 [snip] I'll probably have quite a few comments on this over the next few days. First let me start off by saying I like the general approach of your PEP. Let me kick off the bikeshed part of the discussion by saying that the "role/performs" terminology is not my favorite - I kind of like the terminology that was introduced by Jeff Shell in an earlier message, specifically the terms "specifies", "provides" and "implements": * An interface *specifies* a set of methods. * An object can *provide* the services that are specified by an interface. * A class can *implement* the services that are specified by an interface. In other words, the difference between 'provides' and 'implements' is that one is asking about the instance and another is asking about the class. Thus, you can ask an object if it provides() an interface; You can also ask a class if it implements() an interface. Another question that I'd like to ask is: Your PEP describes a mechanism for defining roles and testing for them. What it doesn't define is what roles will be defined in the standard library, and specifically what roles will be defined for the built-in classes. The third issue I want to raise is how the roles system interacts with PJE's generic functions PEP. Let me give some background: In the most general terms, a method of a generic function is a function with a set of constraints on the arguments. These constraints can be types, but they don't have to be. Depending on the actual calling arguments, the dispatcher will attempt to find the method whose constraints most closely match the calling arguments. Clearly, in a system in which there are both roles and generics, we would want to create overloads in which the constraints can be role tests rather than type tests. So for example, if Guard is a role, we want to be able to dispatch on it: @overload def idle( actor: Guard ): ... We would also like to be able to define methods that contain both type-tests and role-tests: @overload def watch( actor: Guard, treasure: list ): ... In order for this to work, the dispatcher will need to know that the first argument requires a role-test ("performs" or whatever), while the second argument requires a type-test. I would like to see some more detail on how this would work. However, its even more complicated than that. Generic function dispatchers can be made to work efficiently if there is a way to compare constraints with each other. Specifically, what you need to know is this: given any two tests, are those tests completely disjoint, is one test a subset of the other, or neither? For example, suppose we have the following overloads: class MyList( list ): ... @overload def watch( a: list ) ... @overload def watch( a: tuple ) ... @overload def watch( a: MyList ) The most efficient dispatch algorithm for this particular set of overloads will first test to see if the argument is a list; If not, it will test to see if it's a tuple, otherwise it will test to see if it's a MyList. In other words, even though there are three possible tests, we only need to perform two of them at most, because if it is a list, then it can't possibly be a tuple, and if it's not a list then it can't possibly be a MyList. As you add more overloads and more tests, this kind of pruning becomes important, and there are some wonderful algorithms for figuring this all out. Now consider, however, the following situation, where you have a role, a class which implements that role, and a subclass: class Worker( Role ): ... @perform_role( Worker ) class Robot: ... class ShinyRobot( Robot ): ... Now, suppose we have a number of overloads: @overload def work( actor: Worker ): ... @overload def work( actor: Robot ): ... @overload def work( actor: ShinyRobot ): ... In this case, the dispatching on the first argument we are sometimes doing type tests, and sometimes doing role tests. Furthermore, we have an interaction between roles and types: The ShinyRobot test (a type test) can never succeed unless the role test (Worker) also succeeds. For purposes of dispatching efficiency, we want to be able to allow the dispatcher to know that the "ShinyRobot" is a subset of "Worker", even though the two tests are different kinds of tests. Thus, the generic function dispatcher will need to be able to take two tests, which might both be type tests, or both role tests, or one of each - and compare them to see if one is a subset of the other, or if they overlap at all. -- Talin From arvind1.singh at gmail.com Mon May 14 12:06:32 2007 From: arvind1.singh at gmail.com (Arvind Singh) Date: Mon, 14 May 2007 15:36:32 +0530 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> Message-ID: > Asking Questions About Roles Shouldn't there be some way to ``revoke'' roles? How can we get a list of all roles played by an object? Should there be a way to check ``loosely'' whether an object can potentially play a given role? (i.e., checking whether an object provides a give interface, atleast syntactically) I understand that this can be achieved via: try: now_performs(instance.__class__, [role.RoleToCheck]) except: print("can't play role") else: print("maybe plays role") But such approach will be error prone (``revoking'' roles later, and such; destructive checks are bad idea, anyway). Better would be to have:: if performs(instance, [role.RoleToCheck], loose=True): print("maybe plays role") > Assigning Roles at Runtime Maybe it should be suggested that dynamic role assignment should not be made without knowing the implementation (with a reminder about tree's bark() and dog's bark() ). Regards, Arvind From p.f.moore at gmail.com Mon May 14 12:56:47 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 14 May 2007 11:56:47 +0100 Subject: [Python-3000] Revised PEP 3119 (Abstract Base Classes) In-Reply-To: References: Message-ID: <79990c6b0705140356i2cfe4dccx9410534e211d8c94@mail.gmail.com> On 12/05/07, Guido van Rossum wrote: > Here's a new version of the ABC PEP. A lot has changed; a lot remains. > I can't give a detailed overview of all the changes, and a diff would > show too many spurious changes, but some of the highlights are: As a general comment, I like the direction this has moved in. In particular, I like the fact that ABCs can be registered after the fact (as Talin describes it, "post-hoc classification". > ABCs vs. Duck Typing This remains my key concern. The PEP nicely addresses the issue as far as core Python is concerned, but I'd be happier with some style recommendations for 3rd party frameworks clarifying that they should also avoid taking the "stick" approach. OTOH, we've had interface implementations, and heavy users of them (e.g. zope.interface and Twisted) for ages now, and the world hasn't ended, so I guess there's no reason to assume that people won't use ABCs sensibly, too. Overall, then, I'm moving towards a +1 (or at least a +0.5...) Paul. From exarkun at divmod.com Mon May 14 13:32:40 2007 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Mon, 14 May 2007 07:32:40 -0400 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: Message-ID: <20070514113240.19381.1581980774.divmod.quotient.32995@ohm> On Sun, 13 May 2007 22:03:26 -0500, Michael Urman wrote: >On 5/13/07, Guido van Rossum wrote: >> The answer to all of this is the filesystem encoding, which is already >> supported. Doesn't appear particularly difficult to me. > >Okay, that's fair. It seems reasonable to accept the limitations of >following the filesystem encoding for module names. I should probably >test py3k to make sure it already has updated __import__ to use the >filesystem encoding instead of the default encoding, but instead I'll >just feebly imply the question here. It's harder for this, actually. Even if you know the encoding, you'll still run into problems when you don't know the normalization. Consider the case where a developer creates a module with a non-ASCII name on OS X and then distributes it. There is a fair to strong chance that their source code will use NFC for the module name. During development, this will work just fine, as OS X normalizes all filename access to NFD. When someone on another platform attempts to use the module though, they will mysteriously find that it cannot be found. Their NFC spelling of the module name won't find the NFD file in the filesystem, and they will likely be completely baffled by the failure. This is, of course, an existing difficulty with dealing with unicode filenames in Python, but at least the interpreter itself doesn't yet have to concern itself with it, as no language features require it. I suspect that if non-ASCII module names are allowed, a lot of people will be running into this. Jean-Paul From p.f.moore at gmail.com Mon May 14 16:00:45 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 14 May 2007 15:00:45 +0100 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> Message-ID: <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> On 11/05/07, Neal Norwitz wrote: > On 5/11/07, Paul Moore wrote: > > Hmm. My view is that it *is* simple to explain, but unfortunately > > Phillip's explanation in the PEP is not that simple explanation :-( > > [snip] > > > I would argue that the PEP could be *very* simple if it restricted > > itself to the basic idea. > > Could you write up the simple version that you would use instead? I'd have liked to, but unfortunately, I haven't had the time to do so (and I probably won't in the near future). However, it looks like there's a general feeling emerging that snipping certain sections would be enough. I'd agree with that - my personal feeling is that it'd be OK to remove all of the following sections: * "Before" and "After" Methods (as per Steven Bethard's suggestion) * "Around" Methods (as per Steven Bethard's suggestion) * Custom Combinations (as per Steven Bethard's suggestion) * Interfaces and Adaptation (doesn't feel like a core aspect of the proposal) * Aspects (as per Steven Bethard's suggestion) * Extension API (currently empty, and that hasn't hampered the discussions!!) I'd be OK with them going into an additional PEP, but to be honest, it wouldn't bother me to see them left out of the PEP process altogether[1]. (I don't feel that I have enough experience with *using* GFs to comment meaningfully, so I'd be willing to defer to Phillip's judgement here). Paul. [1] But I'd like to see them documented in the final implementation - I'm not suggesting they be undocumented features. From guido at python.org Mon May 14 17:22:48 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 08:22:48 -0700 Subject: [Python-3000] Revised PEP 3119 (Abstract Base Classes) In-Reply-To: <79990c6b0705140356i2cfe4dccx9410534e211d8c94@mail.gmail.com> References: <79990c6b0705140356i2cfe4dccx9410534e211d8c94@mail.gmail.com> Message-ID: On 5/14/07, Paul Moore wrote: > > ABCs vs. Duck Typing > > This remains my key concern. The PEP nicely addresses the issue as far > as core Python is concerned, but I'd be happier with some style > recommendations for 3rd party frameworks clarifying that they should > also avoid taking the "stick" approach. OTOH, we've had interface > implementations, and heavy users of them (e.g. zope.interface and > Twisted) for ages now, and the world hasn't ended, so I guess there's > no reason to assume that people won't use ABCs sensibly, too. I'm not sure what language you would specifically like to see added to the PEP. "Recommendation for 3rd party frameworks: please don't use the stick approach." sounds a little strange. What's the point you're trying to get across? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 14 17:25:02 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 08:25:02 -0700 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: <20070514113240.19381.1581980774.divmod.quotient.32995@ohm> References: <20070514113240.19381.1581980774.divmod.quotient.32995@ohm> Message-ID: On 5/14/07, Jean-Paul Calderone wrote: > On Sun, 13 May 2007 22:03:26 -0500, Michael Urman wrote: > >On 5/13/07, Guido van Rossum wrote: > >> The answer to all of this is the filesystem encoding, which is already > >> supported. Doesn't appear particularly difficult to me. > > > >Okay, that's fair. It seems reasonable to accept the limitations of > >following the filesystem encoding for module names. I should probably > >test py3k to make sure it already has updated __import__ to use the > >filesystem encoding instead of the default encoding, but instead I'll > >just feebly imply the question here. > > It's harder for this, actually. Even if you know the encoding, you'll > still run into problems when you don't know the normalization. Consider > the case where a developer creates a module with a non-ASCII name on OS X > and then distributes it. There is a fair to strong chance that their > source code will use NFC for the module name. During development, this > will work just fine, as OS X normalizes all filename access to NFD. When > someone on another platform attempts to use the module though, they will > mysteriously find that it cannot be found. Their NFC spelling of the > module name won't find the NFD file in the filesystem, and they will likely > be completely baffled by the failure. > > This is, of course, an existing difficulty with dealing with unicode > filenames in Python, but at least the interpreter itself doesn't yet > have to concern itself with it, as no language features require it. > I suspect that if non-ASCII module names are allowed, a lot of people > will be running into this. Isn't normalization also going to be an issue with using non-ASCII in general? Does it mean that Python will have to use a normalization before comparing identifiers as equal? That's terrible, as it will vastly increase the amount needed to hash a string, too. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 14 17:29:12 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 08:29:12 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> Message-ID: On 5/14/07, Paul Moore wrote: > However, it looks like > there's a general feeling emerging that snipping certain sections > would be enough. I'd agree with that - my personal feeling is that > it'd be OK to remove all of the following sections: > > * "Before" and "After" Methods (as per Steven Bethard's suggestion) > * "Around" Methods (as per Steven Bethard's suggestion) > * Custom Combinations (as per Steven Bethard's suggestion) > * Interfaces and Adaptation (doesn't feel like a core aspect of the proposal) > * Aspects (as per Steven Bethard's suggestion) > * Extension API (currently empty, and that hasn't hampered the discussions!!) > > I'd be OK with them going into an additional PEP, but to be honest, it > wouldn't bother me to see them left out of the PEP process > altogether[1]. (I don't feel that I have enough experience with > *using* GFs to comment meaningfully, so I'd be willing to defer to > Phillip's judgement here). That would suit me fine, since my inclination is to approve some form of the basics of the PEP (with reservations I will explain in another message) but to reject the second PEP. > [1] But I'd like to see them documented in the final implementation - > I'm not suggesting they be undocumented features. I'm suggesting they aren't features at all, except for the extension API. All the other stuff should be addable in a separate module using the extension API. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jason.orendorff at gmail.com Mon May 14 17:42:24 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Mon, 14 May 2007 11:42:24 -0400 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: References: <20070514113240.19381.1581980774.divmod.quotient.32995@ohm> Message-ID: On 5/14/07, Guido van Rossum wrote: > Isn't normalization also going to be an issue with using non-ASCII in > general? Does it mean that Python will have to use a normalization > before comparing identifiers as equal? That's terrible, as it will > vastly increase the amount needed to hash a string, too. PEP 3131 addresses this. The tokenizer would normalize identifier tokens to NFC. Because this happens so early, the rest of Python would be unaffected. -j From jason.orendorff at gmail.com Mon May 14 18:22:56 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Mon, 14 May 2007 12:22:56 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4647B15F.7040700@canterbury.ac.nz> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <4647B15F.7040700@canterbury.ac.nz> Message-ID: On 5/13/07, Greg Ewing wrote: > I don't think this scenario is all that unlikely. A > program is initially written by a Russian programmer > who uses his own version of "a" as a variable name. > Later an English-speaking programmer makes some > changes, and uses an ascii "a". Now there are two > subtly different variables called "a" in different > parts of the program. Greg, If this scenario were *not* unlikely, it would have happened to a Java programmer somewhere, right? Has this *ever* happened? I wasn't able to find a case. -- Jason From pje at telecommunity.com Mon May 14 18:34:15 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 12:34:15 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510012202.EB0A83A4061@sparrow.telecommunity.com> <464289C8.4080004@canterbury.ac.nz> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> Message-ID: <20070514163231.275CE3A4036@sparrow.telecommunity.com> At 08:29 AM 5/14/2007 -0700, Guido van Rossum wrote: >On 5/14/07, Paul Moore wrote: > > However, it looks like > > there's a general feeling emerging that snipping certain sections > > would be enough. I'd agree with that - my personal feeling is that > > it'd be OK to remove all of the following sections: > > > > * "Before" and "After" Methods (as per Steven Bethard's suggestion) > > * "Around" Methods (as per Steven Bethard's suggestion) > > * Custom Combinations (as per Steven Bethard's suggestion) > > * Interfaces and Adaptation (doesn't feel like a core aspect of > the proposal) > > * Aspects (as per Steven Bethard's suggestion) > > * Extension API (currently empty, and that hasn't hampered the > discussions!!) > > > > I'd be OK with them going into an additional PEP, but to be honest, it > > wouldn't bother me to see them left out of the PEP process > > altogether[1]. (I don't feel that I have enough experience with > > *using* GFs to comment meaningfully, so I'd be willing to defer to > > Phillip's judgement here). > >That would suit me fine, since my inclination is to approve some form >of the basics of the PEP (with reservations I will explain in another >message) but to reject the second PEP. FYI, wrt to Paul's list, my own list for the 2nd PEP doesn't include interfaces and adaptation; they'd be squarely in the first PEP. > > [1] But I'd like to see them documented in the final implementation - > > I'm not suggesting they be undocumented features. > >I'm suggesting they aren't features at all, except for the extension >API. All the other stuff should be addable in a separate module using >the extension API. I don't see what the benefit is of making people implement their own versions of @before, @after, and @around, which then won't interoperate properly with others' versions of the same thing. Even if we leave in place the MethodList base class (which Before and After are subclasses of), one of its limitations is that it can only combine methods of the same type. There's no way for two different user-implemented "befores" to merge at the same precedence level, without some fairly fancy footwork on the implementer's part, or some kind of convention being established as to how to tell whether a method intends to be a before or after or whatever. (And this same-precedence merging is critical feature of @before/@after, as they are used mainly for "observer"-like hooks, where multiple libraries may be observing the same thing.) So, one of the reasons for including those features (along with Aspect) in the stdlib is the standardization part. Really, standardization of a lot of this stuff is the main point to having a PEP at all. By the way, I'm not sure if I mentioned this before, but Ruby 2.0 is supposed to include before/after/around qualifiers, except they're called pre/post/wrap, and I'm not sure if the combination rules are 100% the same as my before/after/around. And they're using Ruby's open classes rather than standalone generic functions. But it's another data point. Note that in current Ruby, you can simulate generic functions (single-dispatch only) via open classes as long as you use sufficiently-unique method names. The fact that Matz wants to add these qualifiers seems to suggest that simple next-method chaining (i.e. super) isn't as expressive as they'd like. Unfortunately, I haven't been able to find an RCR for this feature, only references to RubyConf slide presentations, so I don't know what their specific rationale is. From collinw at gmail.com Mon May 14 18:35:14 2007 From: collinw at gmail.com (Collin Winter) Date: Mon, 14 May 2007 09:35:14 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <4647B15F.7040700@canterbury.ac.nz> Message-ID: <43aa6ff70705140935g573a26a2wd3aa88703aa0f485@mail.gmail.com> On 5/14/07, Jason Orendorff wrote: > On 5/13/07, Greg Ewing wrote: > > I don't think this scenario is all that unlikely. A > > program is initially written by a Russian programmer > > who uses his own version of "a" as a variable name. > > Later an English-speaking programmer makes some > > changes, and uses an ascii "a". Now there are two > > subtly different variables called "a" in different > > parts of the program. > > Greg, > > If this scenario were *not* unlikely, it would have happened > to a Java programmer somewhere, right? Has this *ever* > happened? I wasn't able to find a case. Well, it's not exactly the kind of thing that makes for a riveting blog post. This is something the Perl 6 people debated for months on end when deciding whether to support Unicode identifiers. They eventually came to the conclusion that if your editor doesn't flag this kind of thing, it's a bug in the editor. I don't know of any editors that actually do this, but there you go. Of course, one of the main motivations for including Unicode support in Perl 6 was that they were running out of "meaningful" ASCII punctuation combinations and were looking to things like the ?+? operator and the ? operator for their salvation. Thankfully Python doesn't have this problem. Collin Winter From jcarlson at uci.edu Mon May 14 18:40:08 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 14 May 2007 09:40:08 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <4647B15F.7040700@canterbury.ac.nz> Message-ID: <20070514093643.8559.JCARLSON@uci.edu> "Jason Orendorff" wrote: > > On 5/13/07, Greg Ewing wrote: > > I don't think this scenario is all that unlikely. A > > program is initially written by a Russian programmer > > who uses his own version of "a" as a variable name. > > Later an English-speaking programmer makes some > > changes, and uses an ascii "a". Now there are two > > subtly different variables called "a" in different > > parts of the program. > > If this scenario were *not* unlikely, it would have happened > to a Java programmer somewhere, right? Has this *ever* > happened? I wasn't able to find a case. Have you been able to find substantial Java source in which non-ascii identifiers were used? I have been curious about its prevalence, but wouldn't even know how to start searching for such code. - Josiah From pje at telecommunity.com Mon May 14 18:42:27 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 12:42:27 -0400 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.co m> References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> Message-ID: <20070514164041.C8C033A4036@sparrow.telecommunity.com> At 10:36 PM 5/13/2007 -0700, Collin Winter wrote: >2. Querying instances :: > > if performs(my_elf, Thieving): > ... -1 on using any function other than isinstance() for this. Rationale: isinstance() makes the code smell of inspection more obvious, where another function name makes it seem like you are doing something harmless. In reality, performs() testing (or any other kind of interface testing) using if-then is always harmful in library code. > The second argument to ``performs()`` may also be anything with a > ``__contains__()`` method, meaning the following is legal: :: > > if performs(my_elf, set([Thieving, Spying, BoyScout])): > ... > > Like ``isinstance()``, the object needs only to perform a single > role out of the set in order for the expression to be true. Right, so let's just use isinstance(). Likewise, issubclass() for checking whether instances of a class perform a role. (And if issubclass() works, then roles will also be usable by PEP 3124 generic functions without any additional effort.) From guido at python.org Mon May 14 18:41:55 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 09:41:55 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070514163231.275CE3A4036@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> Message-ID: On 5/14/07, Phillip J. Eby wrote: > At 08:29 AM 5/14/2007 -0700, Guido van Rossum wrote: > >On 5/14/07, Paul Moore wrote: > > > However, it looks like > > > there's a general feeling emerging that snipping certain sections > > > would be enough. I'd agree with that - my personal feeling is that > > > it'd be OK to remove all of the following sections: > > > > > > * "Before" and "After" Methods (as per Steven Bethard's suggestion) > > > * "Around" Methods (as per Steven Bethard's suggestion) > > > * Custom Combinations (as per Steven Bethard's suggestion) > > > * Interfaces and Adaptation (doesn't feel like a core aspect of > > the proposal) > > > * Aspects (as per Steven Bethard's suggestion) > > > * Extension API (currently empty, and that hasn't hampered the > > discussions!!) > > > > > > I'd be OK with them going into an additional PEP, but to be honest, it > > > wouldn't bother me to see them left out of the PEP process > > > altogether[1]. (I don't feel that I have enough experience with > > > *using* GFs to comment meaningfully, so I'd be willing to defer to > > > Phillip's judgement here). > > > >That would suit me fine, since my inclination is to approve some form > >of the basics of the PEP (with reservations I will explain in another > >message) but to reject the second PEP. > > FYI, wrt to Paul's list, my own list for the 2nd PEP doesn't include > interfaces and adaptation; they'd be squarely in the first PEP. > > > > > [1] But I'd like to see them documented in the final implementation - > > > I'm not suggesting they be undocumented features. > > > >I'm suggesting they aren't features at all, except for the extension > >API. All the other stuff should be addable in a separate module using > >the extension API. > > I don't see what the benefit is of making people implement their own > versions of @before, @after, and @around, which then won't > interoperate properly with others' versions of the same thing. Even > if we leave in place the MethodList base class (which Before and > After are subclasses of), one of its limitations is that it can only > combine methods of the same type. There's no way for two different > user-implemented "befores" to merge at the same precedence level, > without some fairly fancy footwork on the implementer's part, or some > kind of convention being established as to how to tell whether a > method intends to be a before or after or whatever. (And this > same-precedence merging is critical feature of @before/@after, as > they are used mainly for "observer"-like hooks, where multiple > libraries may be observing the same thing.) > > So, one of the reasons for including those features (along with > Aspect) in the stdlib is the standardization part. Really, > standardization of a lot of this stuff is the main point to having a > PEP at all. OK, let me repeat this request than: real use cases! Point me to code that uses or could be dramatically simplified by adding all this. Until, then, before/after and everything beyond it is solidly in YAGNI-land. > By the way, I'm not sure if I mentioned this before, but Ruby 2.0 is > supposed to include before/after/around qualifiers, except they're > called pre/post/wrap, and I'm not sure if the combination rules are > 100% the same as my before/after/around. And they're using Ruby's > open classes rather than standalone generic functions. But it's > another data point. > > Note that in current Ruby, you can simulate generic functions > (single-dispatch only) via open classes as long as you use > sufficiently-unique method names. The fact that Matz wants to add > these qualifiers seems to suggest that simple next-method chaining > (i.e. super) isn't as expressive as they'd like. Unfortunately, I > haven't been able to find an RCR for this feature, only references to > RubyConf slide presentations, so I don't know what their specific rationale is. So if Matz jumped off a cliff, would you recommend I jump too? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 14 18:43:22 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 09:43:22 -0700 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: References: <20070514113240.19381.1581980774.divmod.quotient.32995@ohm> Message-ID: On 5/14/07, Jason Orendorff wrote: > On 5/14/07, Guido van Rossum wrote: > > Isn't normalization also going to be an issue with using non-ASCII in > > general? Does it mean that Python will have to use a normalization > > before comparing identifiers as equal? That's terrible, as it will > > vastly increase the amount needed to hash a string, too. > > PEP 3131 addresses this. The tokenizer would normalize identifier > tokens to NFC. Because this happens so early, the rest of Python > would be unaffected. Does the tokenizer do this for all string literals, too? Otherwise you could still get surprises with things like x.foo vs. getattr(x, "foo"), if the name foo were normalized but the string "foo" were not. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Mon May 14 18:58:51 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 12:58:51 -0400 Subject: [Python-3000] PEP 367: New Super In-Reply-To: <003001c795f8$d5275060$0201a8c0@mshome.net> References: <003001c795f8$d5275060$0201a8c0@mshome.net> Message-ID: <20070514165704.4F8D23A4036@sparrow.telecommunity.com> At 05:23 PM 5/14/2007 +1000, Tim Delaney wrote: >Determining the class object to use >''''''''''''''''''''''''''''''''''' > >The exact mechanism for associating the method with the defining class is >not >specified in this PEP, and should be chosen for maximum performance. For >CPython, it is suggested that the class instance be held in a C-level >variable >on the function object which is bound to one of ``NULL`` (not part of a >class), >``Py_None`` (static method) or a class object (instance or class method). Another open issue here: is the decorated class used, or the undecorated class? From p.f.moore at gmail.com Mon May 14 19:01:34 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 14 May 2007 18:01:34 +0100 Subject: [Python-3000] Revised PEP 3119 (Abstract Base Classes) In-Reply-To: References: <79990c6b0705140356i2cfe4dccx9410534e211d8c94@mail.gmail.com> Message-ID: <79990c6b0705141001h1b9cf648s791ccedd8175474c@mail.gmail.com> On 14/05/07, Guido van Rossum wrote: > I'm not sure what language you would specifically like to see added to > the PEP. "Recommendation for 3rd party frameworks: please don't use > the stick approach." sounds a little strange. What's the point you're > trying to get across? Something like: As a style issue, 3rd party code which wishes to use ABCs should follow the lead of the core and standard library, and be written in such a way as to allow, but not require, the use of ABCs. But as I said, I'm coming to the view that worrying about such things is FUD. So I'm happy enough to relegate this sort of thing to possible a PEP 8 amendment if such an issue really does become a problem. Paul. From pje at telecommunity.com Mon May 14 19:34:53 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 13:34:53 -0400 Subject: [Python-3000] Revised PEP 3119 (Abstract Base Classes) In-Reply-To: <79990c6b0705141001h1b9cf648s791ccedd8175474c@mail.gmail.co m> References: <79990c6b0705140356i2cfe4dccx9410534e211d8c94@mail.gmail.com> <79990c6b0705141001h1b9cf648s791ccedd8175474c@mail.gmail.com> Message-ID: <20070514173307.E98933A4036@sparrow.telecommunity.com> At 06:01 PM 5/14/2007 +0100, Paul Moore wrote: >On 14/05/07, Guido van Rossum wrote: > > I'm not sure what language you would specifically like to see added to > > the PEP. "Recommendation for 3rd party frameworks: please don't use > > the stick approach." sounds a little strange. What's the point you're > > trying to get across? > >Something like: > >As a style issue, 3rd party code which wishes to use ABCs should >follow the lead of the core and standard library, and be written in >such a way as to allow, but not require, the use of ABCs. > >But as I said, I'm coming to the view that worrying about such things >is FUD. It's not FUD. It's a pitfall that everybody falls into, even "wizards". Realistically, warning people about it won't stop everyone from falling into it, but it will at least help some of them realize their mistake more quickly once they've made it. :) (That is, some will go, "oh, so *that's* why they said this was bad", instead of thinking their problems are one-time flukes.) However, the issue I'm talking about here is that of using if-then tests to select behavior based on some global type, which is a bit more specific than "don't require ABCs", so YMMV. :) From jason.orendorff at gmail.com Mon May 14 19:38:55 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Mon, 14 May 2007 13:38:55 -0400 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: References: <20070514113240.19381.1581980774.divmod.quotient.32995@ohm> Message-ID: On 5/14/07, Guido van Rossum wrote: > Does the tokenizer do this for all string literals, too? Otherwise you > could still get surprises with things like x.foo vs. getattr(x, > "foo"), if the name foo were normalized but the string "foo" were not. It does not; so yes, you could. -j From guido at python.org Mon May 14 20:25:53 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 11:25:53 -0700 Subject: [Python-3000] PEP 3124 - more commentary Message-ID: First I'll try to explain why I don't like the sys._getframe() approach. Phillip's current syntax is roughly: def flatten(x): ... # this is the "base" function @overload def flatten(y: str): ... # this adds an overloaded version The implementation of @overload needs to use sys._getframe() to look up the name of the function ('flatten') in the surrounding namespace. I find this too fragile an approach; it means that I can't easily write another function that calls overload to get the same effect; in particular, I don't see how this code could work: def my_overload(func): "Shorthand for @some_decorator + @overload." return some_decorator(overload(func)) @my_oveload def flatten(z: int): ... If the overload decorator simply looked in the calling scope, it would not find 'flatten' there, since that's the local scope of my_overload. (If it devised some clever scheme of descending down the stack, I would just have to create a more complicated example.) I find the semantics of things that use sys._getframe() muddy and would really much much rather avoid them. Using the approach in my old sandbox/overload/overloading.py code, this objection is removed: the function being overloaded is named explicitly in the decorator. I realize that @overload is only a shorthand for @when(function). But I'd much rather not have @overload at all -- the frame inspection makes it really hard for me to explain carefully what happens without just giving the code that uses sys._getframe(); and this makes it difficult to reason about code using @overload. My own preference for spelling this example would be @overloadable def flatten(x): ... @flatten.overload def _(y: str): ... And for the combined decorator: @my_overload(flatten) def _(z: int): ... ****************** I also really don't like approaches based on patching the function object's code in place. Again, it makes it hard to reason about innocent-looking code. It's one thing to say "we can prove property X assuming no-one assigns a different function to my global f" (since assigning to module globals from outside the module is an extremely rare practice). It's quite another thing to say "we can prove propery X assuming no-one overloads my global f". This is why I really really really want to require flagging the overloadable function before it can be overloaded. (And that's why I propose @flatten.overload instead of @overload(flatten).) ****************** Next, I have a question about the __proceed__ magic argument. I can see why this is useful, and I can see why having this as a magic argument is preferable over other solutions (I couldn't come up with a better solution, and believe me I tried :-). However, I think making this the *first* argument would upset tools that haven't been taught about this yet. Is there any problem with making it a keyword argument with a default of None, by convention to be placed last? ****************** Finally, I looked at the example of overloading a method instead of a function. The little dance required to overload a method defined in a base class feels fragile, and so does the magic apparently required to special-case the first argument. This is unfortunate because I imagine this to be an important use case -- I certainly would expect that the pretty-printing example would need some state that's most conveniently stored on a "pretty-printer" object where one overloads the pprint method, not a pprint function. ****************** Forgive me if this is mentioned in the PEP, but what happens with keyword args? Can I invoke an overloaded function with (some) keyword args, assuming they match the argument names given in the default implementation? Or are we restricted to positional argument passing only? (That would be a big step backwards.) ****************** Also, can we overload different-length signatures (like in C++ or Java)? This is very common in those languages; while Python typically uses default argument values, there are use cases that don't easily fit in that pattern (e.g. the signature of range()). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From daniel at stutzbachenterprises.com Mon May 14 20:31:42 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 14 May 2007 13:31:42 -0500 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705131104r85531f3o12b7e1769d7b7140@mail.gmail.com> References: <1d85506f0705130833v1058b022re0597cf9f259320d@mail.gmail.com> <19dd68ba0705130925j1dd55f1boba9e1b6c036d0422@mail.gmail.com> <43aa6ff70705131009s7d5b177dmea7c790d670ac3c0@mail.gmail.com> <1d85506f0705131042q23270a91qa31ff2f3940019ed@mail.gmail.com> <19dd68ba0705131104r85531f3o12b7e1769d7b7140@mail.gmail.com> Message-ID: On 5/13/07, Guillaume Proux wrote: > Is this a bijective translation ? How good is most people latin > character reading ability among Hebrew speakers? From the beginning, I > can tell from experience that Japanese people have great difficulties > in reading english or even transliterated japanese (which is never > good anyway because of homonyms) Unicode identifiers have been proposed before: http://mail.python.org/pipermail/i18n-sig/2001-February/000741.html http://mail.python.org/pipermail/python-list/2002-May/143901.html Based on those threads, it seems that two empirical criteria that would sway many in the Python community are: 1) Evidence of positive use and results from languages that already support Unicode identifiers, such as Java, and/or 2) Support of Unicode identifiers in languages where the primary language author's native tongue is not based on Latin characters (notably Yukihiro Matsumoto's Ruby). -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From jeremy at alum.mit.edu Mon May 14 20:58:05 2007 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 14 May 2007 14:58:05 -0400 Subject: [Python-3000] getting compiler package failures In-Reply-To: References: Message-ID: On 5/13/07, Guido van Rossum wrote: > test_compiler and test_transformer have been broken for a couple of > months now I believe. > > Unless someone comes to the rescue of the compiler package soon, I'm > tempted to remove it from the p3yk branch -- it doesn't seem to serve > any particularly good purpose, especially now that the AST used by the > compiler written in C is exportable. We currently lack the ability to take an AST exported by the Python-C compiler and pass it back to the compiler to generate bytecode. It would be a lot more practical, however, to add this ability than to try to maintain two different compilers. So a qualified +1 from me. Jeremy > > --Guido > > On 5/13/07, Brett Cannon wrote: > > I just did a ``make distclean`` on a clean checkout (r55300) and > > test_compiler/test_transformer are failing: > > > > File > > "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", > > line 715, in atom > > return self._atom_dispatch[nodelist[0][0]](nodelist) > > KeyError: 322 > > > > or > > > > File > > "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", > > line 776, in lookup_node > > return self._dispatch[node[0]] > > KeyError: 331 > > > > or > > > > File > > "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", > > line 783, in com_node > > return self._dispatch[node[0]](node[1:]) > > KeyError: 339 > > > > > > I don't know the compiler package at all (which is why I am currently > stuck > > on Tony Lownds' PEP 3113 patch since I am getting a > > compiler.transformer.WalkerError) so I have no clue how to > > go about fixing this. Anyone happen to know what may have caused the > > breakage? > > > > -Brett > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: > > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu > From guido at python.org Mon May 14 21:00:28 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 12:00:28 -0700 Subject: [Python-3000] getting compiler package failures In-Reply-To: References: Message-ID: OK Brett, let 'er rip. On 5/14/07, Jeremy Hylton wrote: > On 5/13/07, Guido van Rossum wrote: > > test_compiler and test_transformer have been broken for a couple of > > months now I believe. > > > > Unless someone comes to the rescue of the compiler package soon, I'm > > tempted to remove it from the p3yk branch -- it doesn't seem to serve > > any particularly good purpose, especially now that the AST used by the > > compiler written in C is exportable. > > We currently lack the ability to take an AST exported by the Python-C > compiler and pass it back to the compiler to generate bytecode. It > would be a lot more practical, however, to add this ability than to > try to maintain two different compilers. > > So a qualified +1 from me. > > Jeremy > > > > > --Guido > > > > On 5/13/07, Brett Cannon wrote: > > > I just did a ``make distclean`` on a clean checkout (r55300) and > > > test_compiler/test_transformer are failing: > > > > > > File > > > "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", > > > line 715, in atom > > > return self._atom_dispatch[nodelist[0][0]](nodelist) > > > KeyError: 322 > > > > > > or > > > > > > File > > > "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", > > > line 776, in lookup_node > > > return self._dispatch[node[0]] > > > KeyError: 331 > > > > > > or > > > > > > File > > > "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", > > > line 783, in com_node > > > return self._dispatch[node[0]](node[1:]) > > > KeyError: 339 > > > > > > > > > I don't know the compiler package at all (which is why I am currently > > stuck > > > on Tony Lownds' PEP 3113 patch since I am getting a > > > compiler.transformer.WalkerError) so I have no clue how to > > > go about fixing this. Anyone happen to know what may have caused the > > > breakage? > > > > > > -Brett > > > > > > _______________________________________________ > > > Python-3000 mailing list > > > Python-3000 at python.org > > > http://mail.python.org/mailman/listinfo/python-3000 > > > Unsubscribe: > > > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: > > http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tomerfiliba at gmail.com Mon May 14 21:12:48 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 14 May 2007 21:12:48 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <1d85506f0705130833v1058b022re0597cf9f259320d@mail.gmail.com> <19dd68ba0705130925j1dd55f1boba9e1b6c036d0422@mail.gmail.com> <43aa6ff70705131009s7d5b177dmea7c790d670ac3c0@mail.gmail.com> <1d85506f0705131042q23270a91qa31ff2f3940019ed@mail.gmail.com> <19dd68ba0705131104r85531f3o12b7e1769d7b7140@mail.gmail.com> Message-ID: <1d85506f0705141212m65b9ec37q5f685f507e394f01@mail.gmail.com> as an english-second-language programmer, i'd really like to be able to have unicode identifiers -- but my gut feeling is -- it will open the door for a tower of babel. once we have chinese, french and hindi function names, i'd be very difficult to interoperate with third party libs. imagine i wrote my code using twisted-he, while my client has installed twisted-fr... kaboom? so the next step would be localization files that would map standard names to locale-specific name? and then the interpreter would use locale-dependent importing? we'll never see the end of that. it would just grow more and more complicated. english, or latin at least, is sufficient for programming. allowing for more languages effectively means the creation of small, close communities, rather than a global one. -1 from me. -tomer On 5/14/07, Daniel Stutzbach wrote: > On 5/13/07, Guillaume Proux wrote: > > Is this a bijective translation ? How good is most people latin > > character reading ability among Hebrew speakers? From the beginning, I > > can tell from experience that Japanese people have great difficulties > > in reading english or even transliterated japanese (which is never > > good anyway because of homonyms) > > Unicode identifiers have been proposed before: > > http://mail.python.org/pipermail/i18n-sig/2001-February/000741.html > http://mail.python.org/pipermail/python-list/2002-May/143901.html > > Based on those threads, it seems that two empirical criteria that > would sway many in the Python community are: > > 1) Evidence of positive use and results from languages that already > support Unicode identifiers, such as Java, and/or > > 2) Support of Unicode identifiers in languages where the primary > language author's native tongue is not based on Latin characters > (notably Yukihiro Matsumoto's Ruby). > > -- > Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC > From pje at telecommunity.com Mon May 14 21:26:10 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 15:26:10 -0400 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: References: Message-ID: <20070514192423.624D63A4036@sparrow.telecommunity.com> At 11:25 AM 5/14/2007 -0700, Guido van Rossum wrote: >The implementation of @overload needs to use sys._getframe() to look >up the name of the function ('flatten') in the surrounding namespace. >I find this too fragile an approach; it means that I can't easily >write another function that calls overload to get the same effect; in >particular, I don't see how this code could work: > > def my_overload(func): > "Shorthand for @some_decorator + @overload." > return some_decorator(overload(func)) > > @my_oveload > def flatten(z: int): ... > >If the overload decorator simply looked in the calling scope, it would >not find 'flatten' there, since that's the local scope of my_overload. >(If it devised some clever scheme of descending down the stack, I >would just have to create a more complicated example.) Actually, your "my_overload" would just need to do its own getframe and call when() on the result, since overload is just sugar for when(). >I realize that @overload is only a shorthand for @when(function). But >I'd much rather not have @overload at all -- the frame inspection >makes it really hard for me to explain carefully what happens without >just giving the code that uses sys._getframe(); and this makes it >difficult to reason about code using @overload. This is why in the very earliest GF discussions here, I proposed a 'defop expr(...)' syntax, as it would eliminate the need for any getframe hackery. >My own preference for spelling this example would be > >@overloadable >def flatten(x): ... > >@flatten.overload >def _(y: str): ... Btw, this is similar to how RuleDispatch actually spells it, except that it's @flatten.when(). Later, I decided I preferred putting the *mode* of combination (e.g. when vs. around vs. whatever) first, both because it reads more naturally (e.g. "when flattening", "before flattening", etc.) and because it enabled one to retroactively extend existing functions. >Next, I have a question about the __proceed__ magic argument. I can >see why this is useful, and I can see why having this as a magic >argument is preferable over other solutions (I couldn't come up with a >better solution, and believe me I tried :-). However, I think making >this the *first* argument would upset tools that haven't been taught >about this yet. Is there any problem with making it a keyword argument >with a default of None, by convention to be placed last? Actually, a pending revision to the PEP is to drop the special name and instead use a special annotation, e.g.: def whatever(nm:next_method, ...): (This idea came up in an early thread when some folks queried whether a better name than __proceed__ could be found.) Anyway, with this, it could also be placed as a keyword argument. The main reason for putting it in the first position is performance. Allowing it to be anywhere, however, would let the choice of where be a matter of style. >Finally, I looked at the example of overloading a method instead of a >function. The little dance required to overload a method defined in a >base class feels fragile, Note that a defop syntax would simplify this; i.e. : defop MyBaseClass.methodname(...): ... This doesn't help with the first-argument magic, however. However, since we're going to have to have some way for 'super' to know the class a function is defined in, ISTM that the same magic should be reusable for the first-argument rule. >Forgive me if this is mentioned in the PEP, but what happens with >keyword args? Can I invoke an overloaded function with (some) keyword >args, assuming they match the argument names given in the default >implementation? Yes. That's done with code generation; PEAK-Rules uses direct bytecode generation, but a sourcecode-based generation is also possible and would be used for the PEP implementation (it was also used in RuleDispatch). >Also, can we overload different-length signatures (like in C++ or >Java)? This is very common in those languages; while Python typically >uses default argument values, there are use cases that don't easily >fit in that pattern (e.g. the signature of range()). I see a couple different possibilities for this. Could you give an example of how you'd *like* it to work? From pje at telecommunity.com Mon May 14 21:40:39 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 15:40:39 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> Message-ID: <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> At 09:41 AM 5/14/2007 -0700, Guido van Rossum wrote: > > Note that in current Ruby, you can simulate generic functions > > (single-dispatch only) via open classes as long as you use > > sufficiently-unique method names. The fact that Matz wants to add > > these qualifiers seems to suggest that simple next-method chaining > > (i.e. super) isn't as expressive as they'd like. Unfortunately, I > > haven't been able to find an RCR for this feature, only references to > > RubyConf slide presentations, so I don't know what their specific > rationale is. > >So if Matz jumped off a cliff, would you recommend I jump too? If we're using cliff-diving as a metaphor for generic functions, I'd say that method combination is more comparable to saying that if Matz decided he'd like to have a swimsuit next time he went cliff-diving, then I would recommend that you consider whether you might like to take a swimsuit as well, were you planning your first such dive. :) In practice, however, I wasn't recommending blindly following Matz or anybody else. I simply said the plan for Ruby was suggestive that method combination is worth looking into further, because in the case of Ruby, they already had single-dispatch generic functions, so the addition suggests combination is no longer considered a YAGNI there. As I said, however, I unfortunately haven't been able to find any documented rationale for the proposal -- implying that I have no idea whether Matz' decision is more comparable to jumping off a cliff or packing a swimsuit, and thus cannot give any actual recommendation with respect to such. :) I simply mentioned the subject in case anybody else knew more about the rationale or where to look for the RCR (if one exists). That is, I think it would be useful to know why there was interest in adding such a feature there. Anyway, that's hardly the same as recommending you jump off a cliff. Indeed, it wasn't a recommendation of any sort at all, just a comment on my investigation into useful references for the PEP -- i.e., something that you asked for more of. As for actual code, I'm looking now for examples from people's code besides mine. The canonical use for me is separating business rules like "@after sell(cust:GoldCustomer, prod:FooProduct): email_regional_sales_mgr()" from reusable library code, so that "enterprise" developers can add business rules to an upgradeable core library maintained by a vendor or architect. Of course, I'll also use them to do debug prints or drop into the debugger. Meanwhile, I've been told repeatedly that TurboGears makes extensive use of RuleDispatch, and my quick look today showed they actually use a custom method combination, but I haven't yet tracked down where it gets used, or what the rationale for it is. It doesn't appear to be used in the core TurboGears package, so I suppose it must be in the various add-ons, which I haven't had time to go through yet. Their custom method combination does *support* before/after/around methods, but their core tests only tested "around" methods that I saw. From jimjjewett at gmail.com Mon May 14 21:43:22 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 14 May 2007 15:43:22 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070514163231.275CE3A4036@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> Message-ID: On 5/14/07, Phillip J. Eby wrote: > I don't see what the benefit is of making people implement their own > versions of @before, @after, and @around, which then won't > interoperate properly with others' versions of the same thing. Even > if we leave in place the MethodList base class (which Before and > After are subclasses of), one of its limitations is that it can only > combine methods of the same type. That sounds broken; could you use a numeric precedence with default levels, like the logging library does? -jJ From jason.orendorff at gmail.com Mon May 14 21:46:20 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Mon, 14 May 2007 15:46:20 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <20070514093643.8559.JCARLSON@uci.edu> References: <4647B15F.7040700@canterbury.ac.nz> <20070514093643.8559.JCARLSON@uci.edu> Message-ID: On 5/14/07, Josiah Carlson wrote: > Have you been able to find substantial Java source in which non-ascii > identifiers were used? I have been curious about its prevalence, but > wouldn't even know how to start searching for such code. No, I haven't. The most substantial use cases (if any) would have to be in closed source code, which is hard to find. I spent a little time looking for Java tutorials in a few languages: Spanish, Japanese, Chinese, Korean. Couldn't find anything in Chinese. (I don't know these languages. I have no idea if I was looking in the right places, etc.) - For identifiers, the Spanish-language tutorials mostly used Spanish words stripped down to ASCII (accents and tildes dropped). - The Korean and Japanese tutorials I found (3 total) used English identifiers exclusively. They did tend to use non-English characters freely in comments and (about half the time) in string literals. The Japanese tutorials had no comments at all in the code. -j From guido at python.org Mon May 14 21:47:26 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 12:47:26 -0700 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: <20070514192423.624D63A4036@sparrow.telecommunity.com> References: <20070514192423.624D63A4036@sparrow.telecommunity.com> Message-ID: On 5/14/07, Phillip J. Eby wrote: > At 11:25 AM 5/14/2007 -0700, Guido van Rossum wrote: > >The implementation of @overload needs to use sys._getframe() to look > >up the name of the function ('flatten') in the surrounding namespace. > >I find this too fragile an approach; it means that I can't easily > >write another function that calls overload to get the same effect; in > >particular, I don't see how this code could work: > > > > def my_overload(func): > > "Shorthand for @some_decorator + @overload." > > return some_decorator(overload(func)) > > > > @my_oveload > > def flatten(z: int): ... > > > >If the overload decorator simply looked in the calling scope, it would > >not find 'flatten' there, since that's the local scope of my_overload. > >(If it devised some clever scheme of descending down the stack, I > >would just have to create a more complicated example.) > > Actually, your "my_overload" would just need to do its own getframe > and call when() on the result, since overload is just sugar for when(). That does nothing to address my abhorrence of sys._getframe(). To the contary, it looks like knowledge of the implementation is required for proper use and understanding of @overload. A big fat -1 on that. > >I realize that @overload is only a shorthand for @when(function). But > >I'd much rather not have @overload at all -- the frame inspection > >makes it really hard for me to explain carefully what happens without > >just giving the code that uses sys._getframe(); and this makes it > >difficult to reason about code using @overload. > > This is why in the very earliest GF discussions here, I proposed a > 'defop expr(...)' syntax, as it would eliminate the need for any > getframe hackery. But that would completely kill your "but it's all pure Python code so it's harmless and portable" argument. It seems that you're really not interested at all in compromising to accept mandatory marking of the base overloadable function. That's too bad, because I'm not compromising either *on that particular issue*. > >My own preference for spelling this example would be > > > >@overloadable > >def flatten(x): ... > > > >@flatten.overload > >def _(y: str): ... > > Btw, this is similar to how RuleDispatch actually spells it, except > that it's @flatten.when(). Later, I decided I preferred putting the > *mode* of combination (e.g. when vs. around vs. whatever) first, both > because it reads more naturally (e.g. "when flattening", "before > flattening", etc.) But the function name isn't "flattening" (and there are good reasons for that). This requires too much squinting to work. > and because it enabled one to retroactively extend > existing functions. Which as you know I don't like, so that argument doesn't hold. I find that "when" feels like a "condition" (albeit a temporal one) and I'd much rather read the descriptor in terms of what the action of the decorator is (i.e. some kind of registration) rather than trying to read like some vaguely declarative English phrase. > >Next, I have a question about the __proceed__ magic argument. I can > >see why this is useful, and I can see why having this as a magic > >argument is preferable over other solutions (I couldn't come up with a > >better solution, and believe me I tried :-). However, I think making > >this the *first* argument would upset tools that haven't been taught > >about this yet. Is there any problem with making it a keyword argument > >with a default of None, by convention to be placed last? > > Actually, a pending revision to the PEP is to drop the special name > and instead use a special annotation, e.g.: > > def whatever(nm:next_method, ...): > > (This idea came up in an early thread when some folks queried whether > a better name than __proceed__ could be found.) Cool, I agree that an annotation is better than a magic name. > Anyway, with this, it could also be placed as a keyword > argument. The main reason for putting it in the first position is > performance. Allowing it to be anywhere, however, would let the > choice of where be a matter of style. Right. What's the performance issue with the first argument? > >Finally, I looked at the example of overloading a method instead of a > >function. The little dance required to overload a method defined in a > >base class feels fragile, > > Note that a defop syntax would simplify this; i.e. : > > defop MyBaseClass.methodname(...): > ... > > This doesn't help with the first-argument magic, however. > > However, since we're going to have to have some way for 'super' to > know the class a function is defined in, ISTM that the same magic > should be reusable for the first-argument rule. Perhaps. Though super only needs to know it once the method is being called, while your decorator (presumably) needs to know when the method is being defined, i.e. before the class object is constructed. Also, the similarities between next-method and super are overwhelming. It would be great if you could work with Tim Delaney on a mechanism underlying all three issues, or at least two of the three. > >Forgive me if this is mentioned in the PEP, but what happens with > >keyword args? Can I invoke an overloaded function with (some) keyword > >args, assuming they match the argument names given in the default > >implementation? > > Yes. That's done with code generation; PEAK-Rules uses direct > bytecode generation, but a sourcecode-based generation is also > possible and would be used for the PEP implementation (it was also > used in RuleDispatch). There's currently no discussion of this. Without a good understanding of the implementation I cannot accept the PEP. > >Also, can we overload different-length signatures (like in C++ or > >Java)? This is very common in those languages; while Python typically > >uses default argument values, there are use cases that don't easily > >fit in that pattern (e.g. the signature of range()). > > I see a couple different possibilities for this. Could you give an > example of how you'd *like* it to work? In the simplest case (no default argument values) overloading two-arg functions and three-arg functions with the same name should act as if there were two completely separate functions, except for the base (default) function. Example: @overloadable def range(start:int, stop:int, step:int): ... # implement xrange @range.overload def range(x): return range(0, x, 1) @range.overload def range(x, y): return range(x, y, 1) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon May 14 21:51:23 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 12:51:23 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> Message-ID: On 5/14/07, Phillip J. Eby wrote: > I simply said the plan for Ruby was suggestive that method > combination is worth looking into further, because in the case of > Ruby, they already had single-dispatch generic functions, so the > addition suggests combination is no longer considered a YAGNI there. > > As I said, however, I unfortunately haven't been able to find any > documented rationale for the proposal -- implying that I have no idea > whether Matz' decision is more comparable to jumping off a cliff or > packing a swimsuit, and thus cannot give any actual recommendation > with respect to such. :) So how do you know what's going on there is the same as what's apparently going on here, i.e. some folks have fallen in love with CLOS or Haskell or whatever and are pushing for some theoretical ideal that has no practical applications? > Meanwhile, I've been told repeatedly that TurboGears makes extensive > use of RuleDispatch, and my quick look today showed they actually use > a custom method combination, but I haven't yet tracked down where it > gets used, or what the rationale for it is. > > It doesn't appear to be used in the core TurboGears package, so I > suppose it must be in the various add-ons, which I haven't had time > to go through yet. Their custom method combination does *support* > before/after/around methods, but their core tests only tested > "around" methods that I saw. I'm looking forward to a more complete examination of that use case. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Mon May 14 21:52:22 2007 From: collinw at gmail.com (Collin Winter) Date: Mon, 14 May 2007 12:52:22 -0700 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> Message-ID: <43aa6ff70705141252r4dbebaa3p1b4e5939719abdff@mail.gmail.com> On 5/14/07, Arvind Singh wrote: > > Asking Questions About Roles > > Shouldn't there be some way to ``revoke'' roles? No, roles are purely additive. Allowing role revocation is an easy recipe for race conditions where one bit of code says type X does a given role and another bit of code says it doesn't. > How can we get a list of all roles played by an object? Something like this could be trivially added. What use-case do you have in mind? > Should there be a way to check ``loosely'' whether an object can > potentially play a given role? (i.e., checking whether an object > provides a give interface, atleast syntactically) This could be added, yes. Collin Winter From jcarlson at uci.edu Mon May 14 22:03:10 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 14 May 2007 13:03:10 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <20070514093643.8559.JCARLSON@uci.edu> Message-ID: <20070514125306.8563.JCARLSON@uci.edu> "Jason Orendorff" wrote: > On 5/14/07, Josiah Carlson wrote: > > Have you been able to find substantial Java source in which non-ascii > > identifiers were used? I have been curious about its prevalence, but > > wouldn't even know how to start searching for such code. > > No, I haven't. > > The most substantial use cases (if any) would have to be > in closed source code, which is hard to find. [snip] > They did tend to use non-English characters freely in > comments and (about half the time) in string literals. > The Japanese tutorials had no comments at all in the > code. Your findings seem to suggest (but not prove either way) that having unicode strings and comments (that Python already supports) may be sufficient for a majority of use-cases (assuming that people document and comment their code ;). It would be nice to be able to find more examples in Java. I guess the question is whether the potential for community fragmentation is worth trying to handle a (seemingly much) smaller set of use-cases than is (already arguably sufficiently) handled with ascii identifiers. - Josiah From collinw at gmail.com Mon May 14 22:03:44 2007 From: collinw at gmail.com (Collin Winter) Date: Mon, 14 May 2007 13:03:44 -0700 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> Message-ID: <43aa6ff70705141303t63520c44g8f88f8ae56732137@mail.gmail.com> On 5/13/07, Steven Bethard wrote: > On 5/13/07, Collin Winter wrote: > > PEP: 3133 > > Title: Introducing Roles > [snip] > > * Roles provide a way of indicating a object's semantics and abstract > > capabilities. A role may define abstract methods, but only as a > > way of delineating an interface through which a particular set of > > semantics are accessed. > [snip] > > * Abstract base classes, by contrast, are a way of reusing common, > > discrete units of implementation. > [snip] > > Using this abstract base class - more properly, a concrete > > mixin - allows a programmer to define a limited set of operators > > and let the mixin in effect "derive" the others. > > So what's the difference between a role and an abstract base class > that used @abstractmethod on all of its methods? Isn't such an ABC > just "delineating an interface"? > > > since the ``OrderingMixin`` class above satisfies the interface > > and semantics expressed in the ``Ordering`` role, we say the mixin > > performs the role: :: > > > > @perform_role(Ordering) > > class OrderingMixin: > > def __ge__(self, other): > > return self > other or self == other > > > > def __le__(self, other): > > return self < other or self == other > > > > def __ne__(self, other): > > return not self == other > > > > # ...and so on > > > > Now, any class that uses the mixin will automatically -- that is, > > without further programmer effort -- be tagged as performing the > > ``Ordering`` role. > > But why is:: > > performs(obj, Ordering) > > any better than:: > > isinstance(obj, Ordering) > > if Ordering is just an appropriately registered ABC? There really is no difference between roles and all- at abstractmethod ABCs. From my point of view, though, roles win because they don't require any changes to the interpreter; they're a much simpler way of expressing the same concept. You may like adding the extra complexity and indirection to the VM necessary to support issubclass()/isinstance() overriding, but I don't. Collin Winter From steven.bethard at gmail.com Mon May 14 22:33:31 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 14 May 2007 14:33:31 -0600 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: <43aa6ff70705141303t63520c44g8f88f8ae56732137@mail.gmail.com> References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> <43aa6ff70705141303t63520c44g8f88f8ae56732137@mail.gmail.com> Message-ID: On 5/14/07, Collin Winter wrote: > There really is no difference between roles and all- at abstractmethod > ABCs. From my point of view, though, roles win because they don't > require any changes to the interpreter; they're a much simpler way of > expressing the same concept. Ok, you clearly have an implementation in mind, but I don't know what it is. As far as I can tell: * metaclass=Role ~ metaclass=ABCMeta, except that all methods must be abstract * perform_role(role)(cls) ~ role.register(cls) * performs(obj, role) ~ isinstance(obj, role) And so, as far as I can see, without an Implementation section, all you're propsing is a different syntax for the same functionality. Was there a discussion of your implementation that I missed? > You may like adding the extra complexity > and indirection to the VM necessary to support > issubclass()/isinstance() overriding, but I don't. Have you looked at Guido's issubclass()/isinstance() patch (http://bugs.python.org/1708353)? I'd hardly say that 34 lines of C code is substantial "extra complexity". STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From foom at fuhm.net Mon May 14 22:44:54 2007 From: foom at fuhm.net (James Y Knight) Date: Mon, 14 May 2007 16:44:54 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> Message-ID: <7835189A-91EC-4582-8667-5DAC18DDF301@fuhm.net> On May 14, 2007, at 12:41 PM, Guido van Rossum wrote: > OK, let me repeat this request than: real use cases! Point me to code > that uses or could be dramatically simplified by adding all this. > Until, then, before/after and everything beyond it is solidly in > YAGNI-land. Excerpted from a recent post to scons-dev by Maciej Pasternacki : > COMMON ISSUES > > Automake uses -local and -hook rules to allow software author to > customize generated Makefile's behaviour without overriding it. Some > way of hinting what should be done before/after/around node is built > should be provided to make it possible also in SCons. API for this > might be slightly based on how Common Lisp Object System's method > combinations work > (http://www.lispworks.com/documentation/HyperSpec/Body/07_ffb.htm). > > General solution would allow -local/-hook-type customization for all > nodes, not just a few selected ones like Automake does. I'm not sure if the poster had seen this PEP already or not, but I pointed him towards it. (Note: this is regarding a proposal, not existing code). James From brett at python.org Mon May 14 23:10:31 2007 From: brett at python.org (Brett Cannon) Date: Mon, 14 May 2007 14:10:31 -0700 Subject: [Python-3000] getting compiler package failures In-Reply-To: References: Message-ID: On 5/14/07, Guido van Rossum wrote: > > OK Brett, let 'er rip. Ripped in revision 55322. -Brett On 5/14/07, Jeremy Hylton wrote: > > On 5/13/07, Guido van Rossum wrote: > > > test_compiler and test_transformer have been broken for a couple of > > > months now I believe. > > > > > > Unless someone comes to the rescue of the compiler package soon, I'm > > > tempted to remove it from the p3yk branch -- it doesn't seem to serve > > > any particularly good purpose, especially now that the AST used by the > > > compiler written in C is exportable. > > > > We currently lack the ability to take an AST exported by the Python-C > > compiler and pass it back to the compiler to generate bytecode. It > > would be a lot more practical, however, to add this ability than to > > try to maintain two different compilers. > > > > So a qualified +1 from me. > > > > Jeremy > > > > > > > > --Guido > > > > > > On 5/13/07, Brett Cannon wrote: > > > > I just did a ``make distclean`` on a clean checkout (r55300) and > > > > test_compiler/test_transformer are failing: > > > > > > > > File > > > > "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", > > > > line 715, in atom > > > > return self._atom_dispatch[nodelist[0][0]](nodelist) > > > > KeyError: 322 > > > > > > > > or > > > > > > > > File > > > > "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", > > > > line 776, in lookup_node > > > > return self._dispatch[node[0]] > > > > KeyError: 331 > > > > > > > > or > > > > > > > > File > > > > "/Users/drifty/Dev/python/3.x/pristine/Lib/compiler/transformer.py", > > > > line 783, in com_node > > > > return self._dispatch[node[0]](node[1:]) > > > > KeyError: 339 > > > > > > > > > > > > I don't know the compiler package at all (which is why I am > currently > > > stuck > > > > on Tony Lownds' PEP 3113 patch since I am getting a > > > > compiler.transformer.WalkerError) so I have no clue how to > > > > go about fixing this. Anyone happen to know what may have caused > the > > > > breakage? > > > > > > > > -Brett > > > > > > > > _______________________________________________ > > > > Python-3000 mailing list > > > > Python-3000 at python.org > > > > http://mail.python.org/mailman/listinfo/python-3000 > > > > Unsubscribe: > > > > > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > > > > > > > > > > > > > -- > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > _______________________________________________ > > > Python-3000 mailing list > > > Python-3000 at python.org > > > http://mail.python.org/mailman/listinfo/python-3000 > > > Unsubscribe: > > > > http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu > > > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070514/2004f7f4/attachment.htm From collinw at gmail.com Mon May 14 23:18:35 2007 From: collinw at gmail.com (Collin Winter) Date: Mon, 14 May 2007 14:18:35 -0700 Subject: [Python-3000] getting compiler package failures In-Reply-To: References: Message-ID: <43aa6ff70705141418k23664327h7416fd24bd0851cd@mail.gmail.com> On 5/14/07, Brett Cannon wrote: > > > On 5/14/07, Guido van Rossum wrote: > > OK Brett, let 'er rip. > > Ripped in revision 55322. Woohoo! From benji at benjiyork.com Mon May 14 23:35:34 2007 From: benji at benjiyork.com (Benji York) Date: Mon, 14 May 2007 17:35:34 -0400 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> Message-ID: <4648D626.1030201@benjiyork.com> Collin Winter wrote: > PEP: 3133 > Title: Introducing Roles Everything included here is included in zope.interface. See in-line comments below for the analogs. [snip] > Performing Your Role > ==================== > > Static Role Assignment > ---------------------- > > Let's start out by defining ``Tree`` and ``Dog`` classes :: > > class Tree(Vegetable): > > def bark(self): > return self.is_rough() > > > class Dog(Animal): > > def bark(self): > return self.goes_ruff() > > While both implement a ``bark()`` method with the same signature, > they do wildly different things. We need some way of differentiating > what we're expecting. Relying on inheritance and a simple > ``isinstance()`` test will limit code reuse and/or force any dog-like > classes to inherit from ``Dog``, whether or not that makes sense. > Let's see if roles can help. :: > > @perform_role(Doglike) > class Dog(Animal): > ... class Dog(Animal): zope.interface.implements(Doglike) > @perform_role(Treelike) > class Tree(Vegetable): > ... class Tree(Vegetable): zope.interface.implements(Treelike) > @perform_role(SitThere) > class Rock(Mineral): > ... class Rock(Mineral): zope.interface.implements(SitThere) > We use class decorators from PEP 3129 to associate a particular role > or roles with a class. zope.interface.implements should be usable with the PEP 3129 syntax, but I showed the current class decorator syntax throughout. > Client code can now verify that an incoming > object performs the ``Doglike`` role, allowing it to handle ``Wolf``, > ``LaughingHyena`` and ``Aibo`` [#aibo]_ instances, too. > > Roles can be composed via normal inheritance: :: > > @perform_role(Guard, MummysLittleDarling) > class GermanShepherd(Dog): > > def guard(self, the_precious): > while True: > if intruder_near(the_precious): > self.growl() > > def get_petted(self): > self.swallow_pride() class GermanShepherd(Dog): zope.interface.implements(Guard, MummysLittleDarling) [rest of class definition is the same] > Here, ``GermanShepherd`` instances perform three roles: ``Guard`` and > ``MummysLittleDarling`` are applied directly, whereas ``Doglike`` > is inherited from ``Dog``. > > > Assigning Roles at Runtime > -------------------------- > > Roles can be assigned at runtime, too, by unpacking the syntactic > sugar provided by decorators. > > Say we import a ``Robot`` class from another module, and since we > know that ``Robot`` already implements our ``Guard`` interface, > we'd like it to play nicely with guard-related code, too. :: > > >>> perform(Guard)(Robot) > > This takes effect immediately and impacts all instances of ``Robot``. >>> zope.interface.classImplements(Robot, Guard) > Asking Questions About Roles > ---------------------------- > > Just because we've told our robot army that they're guards, we'd > like to check in on them occasionally and make sure they're still at > their task. :: > > >>> performs(our_robot, Guard) > True >>> zope.interface.directlyProvides(our_robot, Guard) > What about that one robot over there? :: > > >>> performs(that_robot_over_there, Guard) > True >>> Guard.providedBy(that_robot_over_there) True > The ``performs()`` function is used to ask if a given object > fulfills a given role. It cannot be used, however, to ask a > class if its instances fulfill a role: :: > > >>> performs(Robot, Guard) > False >>> Guard.providedBy(Robot) False > This is because the ``Robot`` class is not interchangeable > with a ``Robot`` instance. But if you want to find out if a class creates instances that provide an interface you can:: >>> Guard.implementedBy(Robot) True > > Defining New Roles > ================== > > Empty Roles > ----------- > > Roles are defined like a normal class, but use the ``Role`` > metaclass. :: > > class Doglike(metaclass=Role): > ... Interfaces are defined like normal classes, but subclass zope.interface.Interface: class Doglike(zope.interface.Interface): pass > Metaclasses are used to indicate that ``Doglike`` is a ``Role`` in > the same way 5 is an ``int`` and ``tuple`` is a ``type``. > > > Composing Roles via Inheritance > ------------------------------- > > Roles may inherit from other roles; this has the effect of composing > them. Here, instances of ``Dog`` will perform both the > ``Doglike`` and ``FourLegs`` roles. :: > > class FourLegs(metaclass=Role): > pass > > class Doglike(FourLegs, Carnivor): > pass > > @perform_role(Doglike) > class Dog(Mammal): > pass class FourLegs(zope.interface.Interface): pass class Doglike(FourLegs, Carnivore): pass class Dog(Mammal): zope.interface.implements(Doglike) > Requiring Concrete Methods > -------------------------- > > So far we've only defined empty roles -- not very useful things. > Let's now require that all classes that claim to fulfill the > ``Doglike`` role define a ``bark()`` method: :: > > class Doglike(FourLegs): > > def bark(self): > pass class Doglike(FourLegs): def bark(): pass > No decorators are required to flag the method as "abstract", and the > method will never be called, meaning whatever code it contains (if any) > is irrelevant. Roles provide *only* abstract methods; concrete > default implementations are left to other, better-suited mechanisms > like mixins. > > Once you have defined a role, and a class has claimed to perform that > role, it is essential that that claim be verified. Here, the > programmer has misspelled one of the methods required by the role. :: > > @perform_role(FourLegs) > class Horse(Mammal): > > def run_like_teh_wind(self) > ... > > This will cause the role system to raise an exception, complaining > that you're missing a ``run_like_the_wind()`` method. The role > system carries out these checks as soon as a class is flagged as > performing a given role. zope.interface does no runtime checking. It has a similar mechanism in zope.interface.verify:: >>> from zope.interface.verify import verifyObject >>> verifyObject(Guard, our_robot) True > Concrete methods are required to match exactly the signature demanded > by the role. Here, we've attempted to fulfill our role by defining a > concrete version of ``bark()``, but we've missed the mark a bit. :: > > @perform_role(Doglike) > class Coyote(Mammal): > > def bark(self, target=moon): > pass > > This method's signature doesn't match exactly with what the > ``Doglike`` role was expecting, so the role system will throw a bit > of a tantrum. zope.interface doesn't do anything like this. I suspect *args, and **kws make it impractical to do so (not mentioning whether or not it's a good idea). The rest of the PEP concerns implementation and other details, so eliding that. -- Benji York http://benjiyork.com From pje at telecommunity.com Mon May 14 23:50:56 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 17:50:56 -0400 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: References: <20070514192423.624D63A4036@sparrow.telecommunity.com> Message-ID: <20070514214915.C361C3A4036@sparrow.telecommunity.com> At 12:47 PM 5/14/2007 -0700, Guido van Rossum wrote: >> >I realize that @overload is only a shorthand for @when(function). But >> >I'd much rather not have @overload at all -- the frame inspection >> >makes it really hard for me to explain carefully what happens without >> >just giving the code that uses sys._getframe(); and this makes it >> >difficult to reason about code using @overload. >> >>This is why in the very earliest GF discussions here, I proposed a >>'defop expr(...)' syntax, as it would eliminate the need for any >>getframe hackery. > >But that would completely kill your "but it's all pure Python code so >it's harmless and portable" argument. Uh, wha? You lost me completely there. A 'defop' syntax simply eliminates the need to name the target function twice (once in the decorator, and again in the 'def'). I don't get what that has to do with stuff being harmless or portable or any of that. Are you perhaps conflating this with the issue of marking functions as overloadable? These are independent ideas, AFAICT. >It seems that you're really not interested at all in compromising to >accept mandatory marking of the base overloadable function. Uh, wha? I already agreed to that a couple of weeks ago: http://mail.python.org/pipermail/python-3000/2007-May/007205.html I just haven't updated the PEP yet -- any more than I've updated it with anything else that's been in these ongoing threads, like the :next_method annotation or splitting the PEP. >>Anyway, with this, it could also be placed as a keyword >>argument. The main reason for putting it in the first position is >>performance. Allowing it to be anywhere, however, would let the >>choice of where be a matter of style. > >Right. What's the performance issue with the first argument? Chaining using the first argument can be implemented using a bound method object, which gets performance bonuses from the C eval loop that partial() objects don't. (Of course, when RuleDispatch was written, partial() objects didn't exist, anyway.) >>However, since we're going to have to have some way for 'super' to >>know the class a function is defined in, ISTM that the same magic >>should be reusable for the first-argument rule. > >Perhaps. Though super only needs to know it once the method is being >called, while your decorator (presumably) needs to know when the >method is being defined, i.e. before the class object is constructed. Not really; at some point the class object has to be assigned and stored somewhere for super to use, so if same process of "assigning" can be used to actually perform the registration, we're good to go. >Also, the similarities between next-method and super are overwhelming. >It would be great if you could work with Tim Delaney on a mechanism >underlying all three issues, or at least two of the three. I'm not sure I follow you. Do you mean, something like using :super as the annotation instead of next_method, or are you just talking about the implementation mechanics? >> >Forgive me if this is mentioned in the PEP, but what happens with >> >keyword args? Can I invoke an overloaded function with (some) keyword >> >args, assuming they match the argument names given in the default >> >implementation? >> >>Yes. That's done with code generation; PEAK-Rules uses direct >>bytecode generation, but a sourcecode-based generation is also >>possible and would be used for the PEP implementation (it was also >>used in RuleDispatch). > >There's currently no discussion of this. Well, actually there's this bit: """The use of BytecodeAssembler can be replaced using an "exec" or "compile" workaround, given a reasonable effort. (It would be easier to do this if the ``func_closure`` attribute of function objects was writable.)""" But the closure bit is irrelevant if we're using @overloadable. >Without a good understanding >of the implementation I cannot accept the PEP. The mechanism is exec'ing of a string containing a function definition. The original function's signature is obtained using inspect.getargspec(), and the string is exec'd to obtain a new function whose signature matches, but whose body contains the generic function lookup code. In practice, the actual function definition has to be nested, so that argument defaults can be passed in without needing to convert them to strings, and so that the needed lookup tables can be seen via closure variables. A string template would look something like: def make_the_function(__defaults, __lookup): def $funcname($accept_signature): return __lookup($type_tuple)($call_signature) return $funcname The $type_tuple bit would expand to something like: type(firstargname), type(secondargname), ... And $accept_signature would expand to the original function's signature, with default values replaced by "__defaults[0]", "__defaults[1]", etc. in order to make the resulting function have the same default values. The function that would be returned from @overloadable would be the result of calling "make_the_function", passing in the original function's func_defaults and an appropriate value for __lookup. A similar approach is used in RuleDispatch currently. >> >Also, can we overload different-length signatures (like in C++ or >> >Java)? This is very common in those languages; while Python typically >> >uses default argument values, there are use cases that don't easily >> >fit in that pattern (e.g. the signature of range()). >> >>I see a couple different possibilities for this. Could you give an >>example of how you'd *like* it to work? > >In the simplest case (no default argument values) overloading two-arg >functions and three-arg functions with the same name should act as if >there were two completely separate functions, except for the base >(default) function. Example: > >@overloadable >def range(start:int, stop:int, step:int): > ... # implement xrange > >@range.overload >def range(x): return range(0, x, 1) > >@range.overload >def range(x, y): return range(x, y, 1) Hm. I'll need to give some thought to that, but it seems to me that it's sort of like having None defaults for the missing arguments, and then treating the missing-argument versions as requiring type(None) for those arguments. Except that we'd need something besides None, and that the overloads would need wrappers that drop the extra arguments. It certainly seems possible, anyway. I'm not sure I like it, though. It's not obvious from the first function's signature that you can call it with fewer arguments, or what that would mean. For example, shouldn't the later signatures be "range(stop)" and "range(start,stop)"? Hm. From pje at telecommunity.com Tue May 15 00:02:42 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 18:02:42 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> Message-ID: <20070514220057.253A73A4036@sparrow.telecommunity.com> At 03:43 PM 5/14/2007 -0400, Jim Jewett wrote: >On 5/14/07, Phillip J. Eby wrote: >>I don't see what the benefit is of making people implement their own >>versions of @before, @after, and @around, which then won't >>interoperate properly with others' versions of the same thing. Even >>if we leave in place the MethodList base class (which Before and >>After are subclasses of), one of its limitations is that it can only >>combine methods of the same type. > >That sounds broken; could you use a numeric precedence with default >levels, like the logging library does? There are lots of things that *could* be done, but I personally dislike numeric levels because they're arbitrary and it's too easy to just tweak a number than think through what you actually intend. However, nothing stops you from inventing a combination type or even a criterion type that uses a numeric precedence. However, at this point, just to prevent further head-exploding I've been leaving that part of the extension API vague. But, the basic idea is that just like Interfaces or ABCs or Roles can be used to annotate arguments, so too could you add other types of criteria objects, and the precedence of those criteria could be used to disambiguate method precedence. In other words, you're not limited to using diffferent combinators in order to extend the precedence system. That's just what we've been discussing. One of the reasons to have standard versions of when/before/after/around, however, is so that most code will never need to define any combinators. The standard ones should handle the vast majority of use cases. Admittedly, before/after/around are IMO 20% cases, not 80% cases. Probably basic overloading is 75-80% of use cases. But before/after/around covers another 20-25% or so, leaving maybe 5% or less for the custom combinator cases. From pje at telecommunity.com Tue May 15 00:34:20 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 18:34:20 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> Message-ID: <20070514223422.9CEF23A4036@sparrow.telecommunity.com> At 12:51 PM 5/14/2007 -0700, Guido van Rossum wrote: >On 5/14/07, Phillip J. Eby wrote: > > I simply said the plan for Ruby was suggestive that method > > combination is worth looking into further, because in the case of > > Ruby, they already had single-dispatch generic functions, so the > > addition suggests combination is no longer considered a YAGNI there. > > > > As I said, however, I unfortunately haven't been able to find any > > documented rationale for the proposal -- implying that I have no idea > > whether Matz' decision is more comparable to jumping off a cliff or > > packing a swimsuit, and thus cannot give any actual recommendation > > with respect to such. :) > >So how do you know what's going on there is the same as what's >apparently going on here, i.e. some folks have fallen in love with >CLOS or Haskell or whatever and are pushing for some theoretical ideal >that has no practical applications? I don't, which is why I said I'm *looking for the RCR or other rationale document*. However, with respect, I didn't go to all the trouble of implementing method combination in RuleDispatch just for the heck of it. (And it was considerable trouble, doing it the way CLOS implements it, until I figured out an approach more suitable for Python and decorators.) But let me try to get closer to the issue that I have. I honestly don't see at this moment in time, how to split out most of the features you don't like (mainly before/after/around), in such a way that they can be put back in by a third-party module, without leading to other problems. For example, I fear that certain of those features (especially before/after/around) require a single "blessed" implementation in order to have a sane/stable base for library inter-op, even if they *could* be separated out and put back in. That is, even if it's possible to separate the "mechanism", I think that for "policy" reasons, they should have a canonical implementation. However, if we posit that I create some "third party" module that should be considered canonical or blessed for that purpose, then what is the difference from simply treating the entire thing as a third-party module to begin with? I'm not trying to cause a problem here, nor dictate to anybody (least of all you!) how it all should be. I'm just saying I don't know *how* to solve this bit in a way that works for everybody. I can go back and spend some more time on the problem of how to separate method combination from the core that I currently envision. But there's going to have to be at least *some* sort of hook there, to allow it to be added back in later. (Notice that if the core doesn't provide a facility to modify existing functions, then the core has to declare all its hooks in advance. But please don't confuse this statement of fact, with an argument for not doing something I've already agreed to do...) Anyway, perhaps you don't care if those features can be added back in, or perhaps you actively wish to discourage this. It would be good to know where you stand on this point. From guido at python.org Tue May 15 00:43:50 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 15:43:50 -0700 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: <20070514214915.C361C3A4036@sparrow.telecommunity.com> References: <20070514192423.624D63A4036@sparrow.telecommunity.com> <20070514214915.C361C3A4036@sparrow.telecommunity.com> Message-ID: On 5/14/07, Phillip J. Eby wrote: > At 12:47 PM 5/14/2007 -0700, Guido van Rossum wrote: > >> >I realize that @overload is only a shorthand for @when(function). But > >> >I'd much rather not have @overload at all -- the frame inspection > >> >makes it really hard for me to explain carefully what happens without > >> >just giving the code that uses sys._getframe(); and this makes it > >> >difficult to reason about code using @overload. > >> > >>This is why in the very earliest GF discussions here, I proposed a > >>'defop expr(...)' syntax, as it would eliminate the need for any > >>getframe hackery. > > > >But that would completely kill your "but it's all pure Python code so > >it's harmless and portable" argument. > > Uh, wha? You lost me completely there. A 'defop' syntax simply > eliminates the need to name the target function twice (once in the > decorator, and again in the 'def'). I don't get what that has to do > with stuff being harmless or portable or any of that. > > Are you perhaps conflating this with the issue of marking functions > as overloadable? These are independent ideas, AFAICT. > > >It seems that you're really not interested at all in compromising to > >accept mandatory marking of the base overloadable function. > > Uh, wha? I already agreed to that a couple of weeks ago: > > http://mail.python.org/pipermail/python-3000/2007-May/007205.html > > I just haven't updated the PEP yet -- any more than I've updated it > with anything else that's been in these ongoing threads, like the > :next_method annotation or splitting the PEP. Ah, sorry. The way this misunderstanding probably originated was that I read your "this is why I originally proposed defop, to avoid getframe hackery" as a maintaining the current need for getframe, instead of a historical fact. > >>Anyway, with this, it could also be placed as a keyword > >>argument. The main reason for putting it in the first position is > >>performance. Allowing it to be anywhere, however, would let the > >>choice of where be a matter of style. > > > >Right. What's the performance issue with the first argument? > > Chaining using the first argument can be implemented using a bound > method object, which gets performance bonuses from the C eval loop > that partial() objects don't. (Of course, when RuleDispatch was > written, partial() objects didn't exist, anyway.) Sounds like premature optimization to me. We can find a way to do it fast later; let's first make it right. > >>However, since we're going to have to have some way for 'super' to > >>know the class a function is defined in, ISTM that the same magic > >>should be reusable for the first-argument rule. > > > >Perhaps. Though super only needs to know it once the method is being > >called, while your decorator (presumably) needs to know when the > >method is being defined, i.e. before the class object is constructed. > > Not really; at some point the class object has to be assigned and > stored somewhere for super to use, so if same process of "assigning" > can be used to actually perform the registration, we're good to go. True. So are you working with Tim Delaney on this? Otherwise he may propose a simpler mechanism that won't allow this re-use of the mechanism. > >Also, the similarities between next-method and super are overwhelming. > >It would be great if you could work with Tim Delaney on a mechanism > >underlying all three issues, or at least two of the three. > > I'm not sure I follow you. Do you mean, something like using :super > as the annotation instead of next_method, or are you just talking > about the implementation mechanics? super is going to be a keyword with magic properties. Wouldn't it be great if instead of @when(...) def flatten(x: Mapping, nm: next_method): ... nm(x) we could write @when(...) def flatten(x: Mapping): ... super.flatten(x) # or super(x) or some other permutation of super? Or do you see the need to call both next-method and super from the same code? > >> >Forgive me if this is mentioned in the PEP, but what happens with > >> >keyword args? Can I invoke an overloaded function with (some) keyword > >> >args, assuming they match the argument names given in the default > >> >implementation? > >> > >>Yes. That's done with code generation; PEAK-Rules uses direct > >>bytecode generation, but a sourcecode-based generation is also > >>possible and would be used for the PEP implementation (it was also > >>used in RuleDispatch). > > > >There's currently no discussion of this. > > Well, actually there's this bit: > > """The use of BytecodeAssembler can be replaced using an "exec" or "compile" > workaround, given a reasonable effort. (It would be easier to do this > if the ``func_closure`` attribute of function objects was writable.)""" > > But the closure bit is irrelevant if we're using @overloadable. Thanks. > >Without a good understanding > >of the implementation I cannot accept the PEP. > > The mechanism is exec'ing of a string containing a function > definition. The original function's signature is obtained using > inspect.getargspec(), and the string is exec'd to obtain a new > function whose signature matches, but whose body contains the generic > function lookup code. Do note that e.g. in IronPython (and maybe also in Jython?) exec/eval/compile are 10-50x slower (relative to the rest of the system) than in CPython. It does look like a clever approach though. > In practice, the actual function definition has to be nested, so that > argument defaults can be passed in without needing to convert them to > strings, and so that the needed lookup tables can be seen via closure > variables. A string template would look something like: > > def make_the_function(__defaults, __lookup): > def $funcname($accept_signature): > return __lookup($type_tuple)($call_signature) > return $funcname > > The $type_tuple bit would expand to something like: > > type(firstargname), type(secondargname), ... > > And $accept_signature would expand to the original function's > signature, with default values replaced by "__defaults[0]", > "__defaults[1]", etc. in order to make the resulting function have > the same default values. > > The function that would be returned from @overloadable would be the > result of calling "make_the_function", passing in the original > function's func_defaults and an appropriate value for __lookup. > > A similar approach is used in RuleDispatch currently. > > > > >> >Also, can we overload different-length signatures (like in C++ or > >> >Java)? This is very common in those languages; while Python typically > >> >uses default argument values, there are use cases that don't easily > >> >fit in that pattern (e.g. the signature of range()). > >> > >>I see a couple different possibilities for this. Could you give an > >>example of how you'd *like* it to work? > > > >In the simplest case (no default argument values) overloading two-arg > >functions and three-arg functions with the same name should act as if > >there were two completely separate functions, except for the base > >(default) function. Example: > > > >@overloadable > >def range(start:int, stop:int, step:int): > > ... # implement xrange > > > >@range.overload > >def range(x): return range(0, x, 1) > > > >@range.overload > >def range(x, y): return range(x, y, 1) > > Hm. I'll need to give some thought to that, but it seems to me that > it's sort of like having None defaults for the missing arguments, and > then treating the missing-argument versions as requiring type(None) > for those arguments. Except that we'd need something besides None, > and that the overloads would need wrappers that drop the extra > arguments. It certainly seems possible, anyway. > > I'm not sure I like it, though. C++ and Java users use it all the time though. > It's not obvious from the first > function's signature that you can call it with fewer arguments, or > what that would mean. For example, shouldn't the later signatures be > "range(stop)" and "range(start,stop)"? Hm. I don't know if the arg names for overloadings must match those of the default function or not -- is that specified by your PEP? My own trivially simple overloading code (sandbox/overload, and now also added as an experiment to sandbox/abc, with slightly different terminology and using issubclass exclusively, as you recommended over a year ago :-) has no problem with this. Of course it only handles positional arguments and completely ignores argument names except as keys into the annotations dict. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 15 01:19:06 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 16:19:06 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070514223422.9CEF23A4036@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> <20070514223422.9CEF23A4036@sparrow.telecommunity.com> Message-ID: On 5/14/07, Phillip J. Eby wrote: > However, with respect, I didn't go to all the trouble of implementing > method combination in RuleDispatch just for the heck of it. (And it > was considerable trouble, doing it the way CLOS implements it, until > I figured out an approach more suitable for Python and decorators.) So you owe us more motivating examples (in addition to the explanatory examples), showing how you had a particular problem, and you couldn't solve it cleanly using the usual suspects (subclassing, callbacks, etc.), and how method combining came to the rescue. Perhaps writing it up like a pattern description a la GoF might help. > But let me try to get closer to the issue that I have. I honestly > don't see at this moment in time, how to split out most of the > features you don't like (mainly before/after/around), in such a way > that they can be put back in by a third-party module, without leading > to other problems. For example, I fear that certain of those > features (especially before/after/around) require a single "blessed" > implementation in order to have a sane/stable base for library > inter-op, even if they *could* be separated out and put back > in. That is, even if it's possible to separate the "mechanism", I > think that for "policy" reasons, they should have a canonical implementation. Please share more details, so your readers can understand this too. Right now the whole discussion around this appears to be in your head only, and what you write is the conclusion *you* have drawn. That's not very helpful -- I have great respect for the powers of your mind, but not quite up to the point that I'll accept a feature because you say it has to be so. > However, if we posit that I create some "third party" module that > should be considered canonical or blessed for that purpose, then what > is the difference from simply treating the entire thing as a > third-party module to begin with? You're absolutely free to implement your entire proposal as a 3rd party library, and then eventually come back and point me to all the users who are clamoring for its inclusion into the standard library. > I'm not trying to cause a problem here, nor dictate to anybody (least > of all you!) how it all should be. I'm just saying I don't know > *how* to solve this bit in a way that works for everybody. But can you at least share enough of the problem so others can look at it and either suggest a solution or agree with your conclusion? > I can go back and spend some more time on the problem of how to > separate method combination from the core that I currently > envision. But there's going to have to be at least *some* sort of > hook there, to allow it to be added back in later. I'm all for hooks. They can take the form of a particular factoring into methods that make it easy to override some method; or using GF's recursively for some of the implementation, etc. > (Notice that if > the core doesn't provide a facility to modify existing functions, > then the core has to declare all its hooks in advance. But please > don't confuse this statement of fact, with an argument for not doing > something I've already agreed to do...) It should be easy though, because you know which hooks you'll need in order to add @before and friends... > Anyway, perhaps you don't care if those features can be added back > in, or perhaps you actively wish to discourage this. It would be > good to know where you stand on this point. Well, right now I don't care because you haven't shown me the use case. I could definitely be swayed by a detailed description of a large use case; much more so than by other arguments I've seen so far. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Tue May 15 01:21:51 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 19:21:51 -0400 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: References: <20070514192423.624D63A4036@sparrow.telecommunity.com> <20070514214915.C361C3A4036@sparrow.telecommunity.com> Message-ID: <20070514232017.BA6A43A4036@sparrow.telecommunity.com> At 03:43 PM 5/14/2007 -0700, Guido van Rossum wrote: > > Chaining using the first argument can be implemented using a bound > > method object, which gets performance bonuses from the C eval loop > > that partial() objects don't. (Of course, when RuleDispatch was > > written, partial() objects didn't exist, anyway.) > >Sounds like premature optimization to me. We can find a way to do it >fast later; let's first make it right. As I said, when RuleDispatch was written, partial() didn't exist; it was less a matter of performance there than convenience. >True. So are you working with Tim Delaney on this? Otherwise he may >propose a simpler mechanism that won't allow this re-use of the >mechanism. PEP 367 doesn't currently propose a mechanism for the actual assignment; I was waiting to see what was proposed, to then suggest as minimal a tweak or generalization as necessary. Also, prior to now, you hadn't commented on the first-argument-class rule and I didn't know if you were going to reject it anyway. >super is going to be a keyword with magic properties. Wouldn't it be >great if instead of > >@when(...) >def flatten(x: Mapping, nm: next_method): > ... > nm(x) > >we could write > >@when(...) >def flatten(x: Mapping): > ... > super.flatten(x) # or super(x) > >or some other permutation of super? Well, either we'd have to implement it using a hidden parameter, or give up on the possibility of the same function being added more than once to the same function (e.g., for both Mapping and some specific types). There's no way for the code in the body of the overload to know in what context it was invoked. The current mechanism works by creating bound methods for each registration of the same function object, in each "applicability chain". That doesn't mean it's impossible, just that I haven't given the mechanism any thought, and at first glance it looks really hairy to implement -- even if it were done using a hidden parameter. >Or do you see the need to call >both next-method and super from the same code? Hm, that's a mind-bender. I can't think of a sensible use case for that, though. If you're a plain method, you'd just use super. If you're a generic function or overloaded method, you'd just call the next method. The only way I can see you doing that is if you needed to call the super of some *other* method, which doesn't make a lot of sense. In any case, we could probably use super(...) for next-method and super.methodname() for everything else, so I wouldn't worry about it. (Which means you'd have to use super.__call__() inside of a __call__ method, but I think that's OK.) >Do note that e.g. in IronPython (and maybe also in Jython?) >exec/eval/compile are 10-50x slower (relative to the rest of the >system) than in CPython. This would only get done by @overloadable, and never again thereafter. >It does look like a clever approach though. Does that mean you dislike it? ;-) > > Hm. I'll need to give some thought to that, but it seems to me that > > it's sort of like having None defaults for the missing arguments, and > > then treating the missing-argument versions as requiring type(None) > > for those arguments. Except that we'd need something besides None, > > and that the overloads would need wrappers that drop the extra > > arguments. It certainly seems possible, anyway. > > > > I'm not sure I like it, though. > >C++ and Java users use it all the time though. Right, but they don't have keyword arguments or defaults, either. The part I'm not sure about has to do with interaction with Python-specific things like those. When do you use each one? One Obvious Way seems to favor default arguments, especially since you can always use defaults of None and implement overloads for type(None) to catch the default cases. i.e., ISTM that cases like range() are more an exception than the rule. > > It's not obvious from the first > > function's signature that you can call it with fewer arguments, or > > what that would mean. For example, shouldn't the later signatures be > > "range(stop)" and "range(start,stop)"? Hm. > >I don't know if the arg names for overloadings must match those of the >default function or not -- is that specified by your PEP? It isn't currently, but that's because it's assumed that all the methods have the same signature. If we were going to allow subset-signatures (i.e, allow you to define methods whose signature omits portions of the main function's signature), ISTM that the argument names should have meaning. Of course, maybe a motivating example other than "range()" would help here, since not too many other functions have optional positional arguments in the middle of the argument list. :) >My own trivially simple overloading code (sandbox/overload, and now >also added as an experiment to sandbox/abc, with slightly different >terminology and using issubclass exclusively, as you recommended over >a year ago :-) has no problem with this. Of course it only handles >positional arguments and completely ignores argument names except as >keys into the annotations dict. Yeah, none of my GF implementations care about the target methods' signatures except for the next_method thingy. But with variable argument lists, I think we *should* care. Also, AFAIK, the languages that allow different-sized argument lists for the same function either don't have first class functions (e.g. Java) or else have special syntax to allow you to refer to the different variations, e.g. "x/1" and "x/2" to refer to the 1 and 2 argument versions of function x. That is, they really *are* different objects. (And Java and C++ of course have less comprehensible forms of name mangling internally.) Personally, though, I think that kind of overloading is a poor substitute for the parameter flexibility we already have in Python. That is, I think those other languages should be envying Python here, rather than the other way around. :) From guido at python.org Tue May 15 02:17:57 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 17:17:57 -0700 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: <20070514232017.BA6A43A4036@sparrow.telecommunity.com> References: <20070514192423.624D63A4036@sparrow.telecommunity.com> <20070514214915.C361C3A4036@sparrow.telecommunity.com> <20070514232017.BA6A43A4036@sparrow.telecommunity.com> Message-ID: On 5/14/07, Phillip J. Eby wrote: > At 03:43 PM 5/14/2007 -0700, Guido van Rossum wrote: > > > Chaining using the first argument can be implemented using a bound > > > method object, which gets performance bonuses from the C eval loop > > > that partial() objects don't. (Of course, when RuleDispatch was > > > written, partial() objects didn't exist, anyway.) > > > >Sounds like premature optimization to me. We can find a way to do it > >fast later; let's first make it right. > > As I said, when RuleDispatch was written, partial() didn't exist; it > was less a matter of performance there than convenience. > > > >True. So are you working with Tim Delaney on this? Otherwise he may > >propose a simpler mechanism that won't allow this re-use of the > >mechanism. > > PEP 367 doesn't currently propose a mechanism for the actual > assignment; I was waiting to see what was proposed, to then suggest > as minimal a tweak or generalization as necessary. Also, prior to > now, you hadn't commented on the first-argument-class rule and I > didn't know if you were going to reject it anyway. > > > >super is going to be a keyword with magic properties. Wouldn't it be > >great if instead of > > > >@when(...) > >def flatten(x: Mapping, nm: next_method): > > ... > > nm(x) > > > >we could write > > > >@when(...) > >def flatten(x: Mapping): > > ... > > super.flatten(x) # or super(x) > > > >or some other permutation of super? > > Well, either we'd have to implement it using a hidden parameter, or > give up on the possibility of the same function being added more than > once to the same function (e.g., for both Mapping and some specific > types). There's no way for the code in the body of the overload to > know in what context it was invoked. > > The current mechanism works by creating bound methods for each > registration of the same function object, in each "applicability chain". > > That doesn't mean it's impossible, just that I haven't given the > mechanism any thought, and at first glance it looks really hairy to > implement -- even if it were done using a hidden parameter. > > > >Or do you see the need to call > >both next-method and super from the same code? > > Hm, that's a mind-bender. I can't think of a sensible use case for > that, though. If you're a plain method, you'd just use super. If > you're a generic function or overloaded method, you'd just call the > next method. > > The only way I can see you doing that is if you needed to call the > super of some *other* method, which doesn't make a lot of sense. In > any case, we could probably use super(...) for next-method and > super.methodname() for everything else, so I wouldn't worry about > it. (Which means you'd have to use super.__call__() inside of a > __call__ method, but I think that's OK.) > > > >Do note that e.g. in IronPython (and maybe also in Jython?) > >exec/eval/compile are 10-50x slower (relative to the rest of the > >system) than in CPython. > > This would only get done by @overloadable, and never again thereafter. > > > >It does look like a clever approach though. > > Does that mean you dislike it? ;-) > > > > > Hm. I'll need to give some thought to that, but it seems to me that > > > it's sort of like having None defaults for the missing arguments, and > > > then treating the missing-argument versions as requiring type(None) > > > for those arguments. Except that we'd need something besides None, > > > and that the overloads would need wrappers that drop the extra > > > arguments. It certainly seems possible, anyway. > > > > > > I'm not sure I like it, though. > > > >C++ and Java users use it all the time though. > > Right, but they don't have keyword arguments or defaults, > either. The part I'm not sure about has to do with interaction with > Python-specific things like those. When do you use each one? One > Obvious Way seems to favor default arguments, especially since you > can always use defaults of None and implement overloads for > type(None) to catch the default cases. i.e., ISTM that cases like > range() are more an exception than the rule. > > > > > It's not obvious from the first > > > function's signature that you can call it with fewer arguments, or > > > what that would mean. For example, shouldn't the later signatures be > > > "range(stop)" and "range(start,stop)"? Hm. > > > >I don't know if the arg names for overloadings must match those of the > >default function or not -- is that specified by your PEP? > > It isn't currently, but that's because it's assumed that all the > methods have the same signature. If we were going to allow > subset-signatures (i.e, allow you to define methods whose signature > omits portions of the main function's signature), ISTM that the > argument names should have meaning. > > Of course, maybe a motivating example other than "range()" would help > here, since not too many other functions have optional positional > arguments in the middle of the argument list. :) > > > >My own trivially simple overloading code (sandbox/overload, and now > >also added as an experiment to sandbox/abc, with slightly different > >terminology and using issubclass exclusively, as you recommended over > >a year ago :-) has no problem with this. Of course it only handles > >positional arguments and completely ignores argument names except as > >keys into the annotations dict. > > Yeah, none of my GF implementations care about the target methods' > signatures except for the next_method thingy. But with variable > argument lists, I think we *should* care. > > Also, AFAIK, the languages that allow different-sized argument lists > for the same function either don't have first class functions (e.g. > Java) or else have special syntax to allow you to refer to the > different variations, e.g. "x/1" and "x/2" to refer to the 1 and 2 > argument versions of function x. That is, they really *are* > different objects. (And Java and C++ of course have less > comprehensible forms of name mangling internally.) > > Personally, though, I think that kind of overloading is a poor > substitute for the parameter flexibility we already have in > Python. That is, I think those other languages should be envying > Python here, rather than the other way around. :) Perhaps. Though C++ *does* have argument default values. Other use cases that come to mind are e.g. APIs that you can pass either a Point object or two (or three!) floats. This is not a natural use case for argument default values, and it's not always convenient to require the user to pass a tuple of floats (perhaps the three-floats API already existed and its signature cannot be changed for compatibility reasons). Or think of a networking function that takes either a "host:port" string or a host and port pair; thinking of this as having a default port is also slightly awkward, as you don't know what to do when passed a "host:port" string and a port. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Tue May 15 02:24:00 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 20:24:00 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> <20070514223422.9CEF23A4036@sparrow.telecommunity.com> Message-ID: <20070515002213.532623A4036@sparrow.telecommunity.com> At 04:19 PM 5/14/2007 -0700, Guido van Rossum wrote: >On 5/14/07, Phillip J. Eby wrote: >>However, with respect, I didn't go to all the trouble of implementing >>method combination in RuleDispatch just for the heck of it. (And it >>was considerable trouble, doing it the way CLOS implements it, until >>I figured out an approach more suitable for Python and decorators.) > >So you owe us more motivating examples (in addition to the explanatory >examples), showing how you had a particular problem, and you couldn't >solve it cleanly using the usual suspects (subclassing, callbacks, >etc.), and how method combining came to the rescue. Perhaps writing it >up like a pattern description a la GoF might help. It's really not that complicated. If you have only strict precedence (i.e., methods with the same signature are ambiguous), you wind up in practice needing a way to disambiguate methods when you don't really care what order they're executed in (because they're being registered independently). Before and After methods give you that escape, because they're assumed to be independent, and thus any number of libraries can thus register a before or after method for any given signature, without conflicting with each other. So the "particular problem" I had is simply that when you are using GF methods as "observer"-like hooks, you need a way to specify them that doesn't result in ambiguities between code that's watching the same thing (but is written by different people). And, the nature of these observer-ish use cases is that you sometimes need pre-observers, and sometimes you need post-observers. (For example, a pre-observer like "block the sale if there's a hold on the item by a more valuable customer" or a post observer like, "send an email to the sales manager if this is an account we got from FooCorp.") Can these use cases be handled with callbacks of some other sort? Sure! But then, we can and do also get by with implementing ad-hoc generic functions using __special__ methods and copy_reg and so on. The point of the PEP was to provide a standardized API for generic functions and method combination, so you don't need to reinvent or relearn new ways of doing it for every single Python library that uses something that follows these patterns. Indeed, having yet another implementation of generic functions was never the point of the PEP, as we already have several of them in the language and stdlib, plus several more third-party modules that implement them! The point, instead, was to standardize an *API* for generic functions, so that one need only learn that API once. A default GF implementation is merely necessary for bootstrapping that API, and useful for "batteries included"-ness. So, if the bar is that a feature has to be unsolvable using ad hoc techniques, it seems the entire PEP would fail on those grounds. We have plenty of ad hoc techniques for implementing GF's or quasi-GF's already, likewise for callbacks and the like. The point was for you to Pronounce on One Obvious API (to Rule Them All). >>But let me try to get closer to the issue that I have. I honestly >>don't see at this moment in time, how to split out most of the >>features you don't like (mainly before/after/around), in such a way >>that they can be put back in by a third-party module, without leading >>to other problems. For example, I fear that certain of those >>features (especially before/after/around) require a single "blessed" >>implementation in order to have a sane/stable base for library >>inter-op, even if they *could* be separated out and put back >>in. That is, even if it's possible to separate the "mechanism", I >>think that for "policy" reasons, they should have a canonical implementation. > >Please share more details, so your readers can understand this too. >Right now the whole discussion around this appears to be in your head >only, and what you write is the conclusion *you* have drawn. Actually, the discussion about method combination precedence has been ongoing in several threads here on Py3K, mostly with Greg Ewing and Jim Jewett. These discussions illustrate why having some basic operators of known precedence gives the system more stability when multiple libraries start playing together. >But can you at least share enough of the problem so others can look at >it and either suggest a solution or agree with your conclusion? Sure. Take a look at peak.rules.core (while keeping in mind all the bits that will be changed per your prior requests): http://svn.eby-sarna.com/PEAK-Rules/peak/rules/core.py?view=markup What you'll notice is that the method combination framework (Method, MethodList, combine_actions, always_overrides, and merge_by_default, if you don't count the places these things get called) is in fact most of the code, with relatively little of it being the actual implementation of Around, Before, or After (or even generic functions themselves!). In principle, I could pull that framework out and leave just a mechanism for adding it back in. But in practice, that framework lays down the principles of "governance" for method combination, as far as how to decide what things have precedence over what. Thus, I'm skeptical of how useful it is in this area to provide mechanism but no policy. It's always possible for someone to create their own independent policy within the mechanism -- even if there's a default policy. But One Obvious Way suggests that there should be *some* sort of policy in place by default, just like we have a standard set of descriptors that implement the conventional forms of properties and methods. You can subclass them or entirely replace them, but they cover all the typical use cases, and you can use them as examples to understand how to do more exotic things. Meanwhile, if we didn't have the examples of properties and methods, how would we know we were designing descriptor hooks correctly? If we are positing that I know enough to design the hooks correctly, we are implicitly positing that I know what the hooks will be used and useful *for*. :) However, by making various use cases (before, after, around, and the custom example) explicit in the PEP, I was attempting to provide the motivation and rationale for the design of the hooks. (Although in all fairness, the hooks are not actually documented in the PEP yet, aside from a listing of function names.) >I'm all for hooks. They can take the form of a particular factoring >into methods that make it easy to override some method; or using GF's >recursively for some of the implementation, etc. This is in fact how it works now; all the extension API functions in the PEP are either existing GF's in peak.rules.core, or proposed for addition. From pje at telecommunity.com Tue May 15 02:35:42 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 20:35:42 -0400 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: References: <20070514192423.624D63A4036@sparrow.telecommunity.com> <20070514214915.C361C3A4036@sparrow.telecommunity.com> <20070514232017.BA6A43A4036@sparrow.telecommunity.com> Message-ID: <20070515003354.194B83A4036@sparrow.telecommunity.com> At 05:17 PM 5/14/2007 -0700, Guido van Rossum wrote: >Other use cases that come to mind are e.g. APIs that you can pass >either a Point object or two (or three!) floats. This is not a natural >use case for argument default values, and it's not always convenient >to require the user to pass a tuple of floats (perhaps the >three-floats API already existed and its signature cannot be changed >for compatibility reasons). Or think of a networking function that >takes either a "host:port" string or a host and port pair; thinking of >this as having a default port is also slightly awkward, as you don't >know what to do when passed a "host:port" string and a port. How do people handle these in Python now? ISTM that idiomatic Python for these cases would either use tuples, or else different method names. Or is the intention here to make it easier for people porting code over from Java and C++? Anyway, as I said, I think it's *possible* to do this. It just strikes me as more complex than existing ways of handling it in Python. More importantly, it seems to go against the grain of at least my mental concept of Python call signatures, in which arguments are inherently *named* (and can be passed using explicit names), with only rare exceptions like range(). In contrast, the languages that have this sort of positional thing only allow arguments to be specified by position, IIRC. That's what makes me uncomfortable with it. That having been said, if you want it, there's probably a way to make it work. I just think we should try to preserve the "nameness" of arguments in the process -- and consider whether the use cases you've listed here actually improve the code clarity any. From guido at python.org Tue May 15 02:45:42 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 17:45:42 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070515002213.532623A4036@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> <20070514223422.9CEF23A4036@sparrow.telecommunity.com> <20070515002213.532623A4036@sparrow.telecommunity.com> Message-ID: I refuse to continue this discussion until the PEP has been rewritten. It's probably a much better use of your time to rewrite the PEP than to argue with me in email too. On 5/14/07, Phillip J. Eby wrote: > At 04:19 PM 5/14/2007 -0700, Guido van Rossum wrote: > >On 5/14/07, Phillip J. Eby wrote: > >>However, with respect, I didn't go to all the trouble of implementing > >>method combination in RuleDispatch just for the heck of it. (And it > >>was considerable trouble, doing it the way CLOS implements it, until > >>I figured out an approach more suitable for Python and decorators.) > > > >So you owe us more motivating examples (in addition to the explanatory > >examples), showing how you had a particular problem, and you couldn't > >solve it cleanly using the usual suspects (subclassing, callbacks, > >etc.), and how method combining came to the rescue. Perhaps writing it > >up like a pattern description a la GoF might help. > > It's really not that complicated. If you have only strict precedence > (i.e., methods with the same signature are ambiguous), you wind up in > practice needing a way to disambiguate methods when you don't really > care what order they're executed in (because they're being registered > independently). > > Before and After methods give you that escape, because they're > assumed to be independent, and thus any number of libraries can thus > register a before or after method for any given signature, without > conflicting with each other. > > So the "particular problem" I had is simply that when you are using > GF methods as "observer"-like hooks, you need a way to specify them > that doesn't result in ambiguities between code that's watching the > same thing (but is written by different people). And, the nature of > these observer-ish use cases is that you sometimes need > pre-observers, and sometimes you need post-observers. > > (For example, a pre-observer like "block the sale if there's a hold > on the item by a more valuable customer" or a post observer like, > "send an email to the sales manager if this is an account we got from > FooCorp.") > > Can these use cases be handled with callbacks of some other > sort? Sure! But then, we can and do also get by with implementing > ad-hoc generic functions using __special__ methods and copy_reg and > so on. The point of the PEP was to provide a standardized API for > generic functions and method combination, so you don't need to > reinvent or relearn new ways of doing it for every single Python > library that uses something that follows these patterns. > > Indeed, having yet another implementation of generic functions was > never the point of the PEP, as we already have several of them in the > language and stdlib, plus several more third-party modules that implement them! > > The point, instead, was to standardize an *API* for generic > functions, so that one need only learn that API once. A default GF > implementation is merely necessary for bootstrapping that API, and > useful for "batteries included"-ness. > > So, if the bar is that a feature has to be unsolvable using ad hoc > techniques, it seems the entire PEP would fail on those grounds. We > have plenty of ad hoc techniques for implementing GF's or quasi-GF's > already, likewise for callbacks and the like. The point was for you > to Pronounce on One Obvious API (to Rule Them All). > > > >>But let me try to get closer to the issue that I have. I honestly > >>don't see at this moment in time, how to split out most of the > >>features you don't like (mainly before/after/around), in such a way > >>that they can be put back in by a third-party module, without leading > >>to other problems. For example, I fear that certain of those > >>features (especially before/after/around) require a single "blessed" > >>implementation in order to have a sane/stable base for library > >>inter-op, even if they *could* be separated out and put back > >>in. That is, even if it's possible to separate the "mechanism", I > >>think that for "policy" reasons, they should have a canonical implementation. > > > >Please share more details, so your readers can understand this too. > >Right now the whole discussion around this appears to be in your head > >only, and what you write is the conclusion *you* have drawn. > > Actually, the discussion about method combination precedence has been > ongoing in several threads here on Py3K, mostly with Greg Ewing and > Jim Jewett. These discussions illustrate why having some basic > operators of known precedence gives the system more stability when > multiple libraries start playing together. > > > >But can you at least share enough of the problem so others can look at > >it and either suggest a solution or agree with your conclusion? > > Sure. Take a look at peak.rules.core (while keeping in mind all the > bits that will be changed per your prior requests): > > http://svn.eby-sarna.com/PEAK-Rules/peak/rules/core.py?view=markup > > What you'll notice is that the method combination framework (Method, > MethodList, combine_actions, always_overrides, and merge_by_default, > if you don't count the places these things get called) is in fact > most of the code, with relatively little of it being the actual > implementation of Around, Before, or After (or even generic functions > themselves!). > > In principle, I could pull that framework out and leave just a > mechanism for adding it back in. But in practice, that framework > lays down the principles of "governance" for method combination, as > far as how to decide what things have precedence over what. > > Thus, I'm skeptical of how useful it is in this area to provide > mechanism but no policy. It's always possible for someone to create > their own independent policy within the mechanism -- even if there's > a default policy. But One Obvious Way suggests that there should be > *some* sort of policy in place by default, just like we have a > standard set of descriptors that implement the conventional forms of > properties and methods. You can subclass them or entirely replace > them, but they cover all the typical use cases, and you can use them > as examples to understand how to do more exotic things. > > Meanwhile, if we didn't have the examples of properties and methods, > how would we know we were designing descriptor hooks correctly? If > we are positing that I know enough to design the hooks correctly, we > are implicitly positing that I know what the hooks will be used and > useful *for*. :) However, by making various use cases (before, > after, around, and the custom example) explicit in the PEP, I was > attempting to provide the motivation and rationale for the design of > the hooks. (Although in all fairness, the hooks are not actually > documented in the PEP yet, aside from a listing of function names.) > > > >I'm all for hooks. They can take the form of a particular factoring > >into methods that make it easy to override some method; or using GF's > >recursively for some of the implementation, etc. > > This is in fact how it works now; all the extension API functions in > the PEP are either existing GF's in peak.rules.core, or proposed for addition. > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 15 02:51:19 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 17:51:19 -0700 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: <20070515003354.194B83A4036@sparrow.telecommunity.com> References: <20070514192423.624D63A4036@sparrow.telecommunity.com> <20070514214915.C361C3A4036@sparrow.telecommunity.com> <20070514232017.BA6A43A4036@sparrow.telecommunity.com> <20070515003354.194B83A4036@sparrow.telecommunity.com> Message-ID: On 5/14/07, Phillip J. Eby wrote: > At 05:17 PM 5/14/2007 -0700, Guido van Rossum wrote: > >Other use cases that come to mind are e.g. APIs that you can pass > >either a Point object or two (or three!) floats. This is not a natural > >use case for argument default values, and it's not always convenient > >to require the user to pass a tuple of floats (perhaps the > >three-floats API already existed and its signature cannot be changed > >for compatibility reasons). Or think of a networking function that > >takes either a "host:port" string or a host and port pair; thinking of > >this as having a default port is also slightly awkward, as you don't > >know what to do when passed a "host:port" string and a port. > > How do people handle these in Python now? ISTM that idiomatic Python > for these cases would either use tuples, or else different method names. Both of which are sub-optimal compared to the C++ and Java solutions. (Especially for constructors, wheer choosing different method names is even moe effort as you'd need to switch to factory functions.) > Or is the intention here to make it easier for people porting code > over from Java and C++? No, my observation is that they have something that would be useful for us. > Anyway, as I said, I think it's *possible* to do this. It just > strikes me as more complex than existing ways of handling it in Python. > > More importantly, it seems to go against the grain of at least my > mental concept of Python call signatures, in which arguments are > inherently *named* (and can be passed using explicit names), with > only rare exceptions like range(). In contrast, the languages that > have this sort of positional thing only allow arguments to be > specified by position, IIRC. That's what makes me uncomfortable with it. Well, in *my* metnal model the argument names are just as often irrelevant as they are useful. I'd be taken aback if I saw this in someone's code: open(filename="/etc/passwd", mode="r"). Perhaps it's too bad that Python cannot express the notion of "these parameters are positional-only" except very clumsily. > That having been said, if you want it, there's probably a way to make > it work. I just think we should try to preserve the "nameness" of > arguments in the process -- and consider whether the use cases you've > listed here actually improve the code clarity any. There seems to be a stalemate. It seems I cannot convince you that this type of overloading is useful. And it seems you cannot explain to me why I need a framework for method combining. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Tue May 15 03:13:01 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 14 May 2007 19:13:01 -0600 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070515002213.532623A4036@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> <20070514223422.9CEF23A4036@sparrow.telecommunity.com> <20070515002213.532623A4036@sparrow.telecommunity.com> Message-ID: On 5/14/07, Phillip J. Eby wrote: > At 04:19 PM 5/14/2007 -0700, Guido van Rossum wrote: > >But can you at least share enough of the problem so others can look at > >it and either suggest a solution or agree with your conclusion? > > Sure. Take a look at peak.rules.core (while keeping in mind all the > bits that will be changed per your prior requests): > > http://svn.eby-sarna.com/PEAK-Rules/peak/rules/core.py?view=markup > > What you'll notice is that the method combination framework (Method, > MethodList, combine_actions, always_overrides, and merge_by_default, > if you don't count the places these things get called) is in fact > most of the code, with relatively little of it being the actual > implementation of Around, Before, or After (or even generic functions > themselves!). Seems to me that from this link what we're missing is a good explanation of how "Method" works since that is the base for Before, After, etc. Thus I'd suggest ripping out the Before, After, etc. sections in the PEP, and replacing them with a section on how Method works. You can use Before and After as examples of how to extend Method. (I'm fine with Before and After being in the module. It's just confusing that they take such a prominent role in the PEP without the mechanism behind them being explained enough.) STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From gproux+py3000 at gmail.com Tue May 15 03:18:09 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Tue, 15 May 2007 10:18:09 +0900 Subject: [Python-3000] Support for PEP 3131 (some links to evidence of usage within communities) Message-ID: <19dd68ba0705141818w62c942b7g576016fcd3cc0ac1@mail.gmail.com> Found some evidence of usage of identifiers in Japanese while doing a quick google search All links below are in Japanese. * Ruby has support for Japanese identifiers (which is not unexpected when you know the origin country of Ruby) http://www.ruby-lang.org/ja/man/?cmd=view;name=%CA%D1%BF%F4%A4%C8%C4%EA%BF%F4 Notice that it says that this is only supported on a local basis. (probably because Ruby cannot handle unicode natively). I also found other people discussing their usage pattern of identifiers and Japanese and they also report this is tremedously useful for beginners especially when you need to read a stacktrace while debugging. * Java has strong supporter of Japanese characters within identifiers. ??http://java-house.jp/ml/archive/j-h-b/032664.html#body They comment that: using japanese improves readability unless used in an extreme way (like changing a *for* loop to use ????? instead of i) One example they give is ------------------------------------------- i = revised(i); ??? i = RevisedByMarubatuMethod(i); ??? i = revised_by_marubatu_method(i); ??????? i = ????????????(i); ------------------------------------------- And of course think the last one is the best..... Table of contents of "Visual J++ Applet Programming book" http://www.hir-net.com/book/book18/contents.html see "Chapter 2.2: You can use Japanese Identifiers !!!" Discussion about variable naming and how being able to use Japanese would solve many naming issues: http://www.atmarkit.co.jp/bbs/phpBB/viewtopic.php?topic=13878&forum=3&start=8&15 Another one like this, where people explain that because it is difficult to come up with good names in English they end up calling everything : makeItem, doItem, addItem http://www.atmarkit.co.jp/bbs/phpBB/viewtopic.php?mode=viewtopic&topic=18616&forum=7&start=0 And for fun, there is this interesting link about a programming language "in Japanese", made for beginners (check this example... awesome!): http://nadesi.com/doc/cmd/doc.cgi?mode=cmd&id=200 I am sure you can find a lot more evidence like this for each and every language. Letting people use their own script and vocabulary to name things will make them better programmers in their own country/cultural reference point. This will increase the audience and support for Python worldwide. I will be contacting Japanese python user group and let them know of the current discussion. Regards, Guillaume From pje at telecommunity.com Tue May 15 03:45:25 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 21:45:25 -0400 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: References: <20070514192423.624D63A4036@sparrow.telecommunity.com> <20070514214915.C361C3A4036@sparrow.telecommunity.com> <20070514232017.BA6A43A4036@sparrow.telecommunity.com> <20070515003354.194B83A4036@sparrow.telecommunity.com> Message-ID: <20070515014338.8D8EA3A4036@sparrow.telecommunity.com> At 05:51 PM 5/14/2007 -0700, Guido van Rossum wrote: >On 5/14/07, Phillip J. Eby wrote: >>At 05:17 PM 5/14/2007 -0700, Guido van Rossum wrote: >> >Other use cases that come to mind are e.g. APIs that you can pass >> >either a Point object or two (or three!) floats. This is not a natural >> >use case for argument default values, and it's not always convenient >> >to require the user to pass a tuple of floats (perhaps the >> >three-floats API already existed and its signature cannot be changed >> >for compatibility reasons). Or think of a networking function that >> >takes either a "host:port" string or a host and port pair; thinking of >> >this as having a default port is also slightly awkward, as you don't >> >know what to do when passed a "host:port" string and a port. >> >>How do people handle these in Python now? ISTM that idiomatic Python >>for these cases would either use tuples, or else different method names. > >Both of which are sub-optimal compared to the C++ and Java solutions. C++ and Java don't have tuples, do they? The open(filename="...") example you gave doesn't bother me in the least, but when I see range()-style APIs, I cringe. However, since this is a matter of taste, I yield to the BDFL. >>That having been said, if you want it, there's probably a way to make >>it work. I just think we should try to preserve the "nameness" of >>arguments in the process -- and consider whether the use cases you've >>listed here actually improve the code clarity any. > >There seems to be a stalemate. It seems I cannot convince you that >this type of overloading is useful. And it seems you cannot explain to >me why I need a framework for method combining. And yet, the difference is that I'm not ruling your proposal out; I'm merely suggesting that we work a bit more on defining what the best way to implement your proposal would be, in order to avoid collateral damage. I also wanted to know more about your use cases; it's now clear that my previous thinking in terms of range() and named arguments as a typical use case is wrong; the things I'd want to do to handle that set of signatures are totally different from the thing you really want, which is to have truly positional arguments. Perhaps the best thing would be to first define a syntactic notion of purely-positional arguments? Then it would merely be a concept that overloading could respect, rather than being something that applies only to generic functions. Or perhaps we could just say that if the main function is defined with *args, we treat those arguments as positional? i.e.: @abstract def range(*args): """This just defines the signature; no implementation here""" @range.overload def range(stop): ... @range.overload def range(start, stop, step=None): ... or: @abstract def draw(*coords): """This just defines the signature; no implementation here""" @draw.overload def draw(x:float, y:float, z:float): draw(Point(x,y,z)) @draw.overload def draw(point:Point): ... From greg.ewing at canterbury.ac.nz Tue May 15 03:42:19 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 15 May 2007 13:42:19 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070515002213.532623A4036@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> <20070514223422.9CEF23A4036@sparrow.telecommunity.com> <20070515002213.532623A4036@sparrow.telecommunity.com> Message-ID: <46490FFB.9050904@canterbury.ac.nz> Phillip J. Eby wrote: > If you have only strict precedence > (i.e., methods with the same signature are ambiguous), you wind up in > practice needing a way to disambiguate methods when you don't really > care what order they're executed in > ... > And, the nature of > these observer-ish use cases is that you sometimes need > pre-observers, and sometimes you need post-observers. This is by far the best explanation I've seen so far of the rationale behind @before/@after. It should definitely go in the PEP. Can you provide a similar justification for @around? Including why it should go around everything else rather than between the @before/@afters and the normal method. Also, why have three things (@before/@after/@around) instead of just one thing (@around with a next-method call). -- Greg From pje at telecommunity.com Tue May 15 03:56:33 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 14 May 2007 21:56:33 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <46490FFB.9050904@canterbury.ac.nz> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> <20070514223422.9CEF23A4036@sparrow.telecommunity.com> <20070515002213.532623A4036@sparrow.telecommunity.com> <46490FFB.9050904@canterbury.ac.nz> Message-ID: <20070515015454.4F2563A4036@sparrow.telecommunity.com> At 01:42 PM 5/15/2007 +1200, Greg Ewing wrote: >Phillip J. Eby wrote: > > If you have only strict precedence > > (i.e., methods with the same signature are ambiguous), you wind up in > > practice needing a way to disambiguate methods when you don't really > > care what order they're executed in > > ... > > And, the nature of > > these observer-ish use cases is that you sometimes need > > pre-observers, and sometimes you need post-observers. > >This is by far the best explanation I've seen so far of >the rationale behind @before/@after. It should definitely >go in the PEP. > >Can you provide a similar justification for @around? @around is for applications to have the "last word" on how something should be handled, i.e. to replace or wrap everything else. >Including why it should go around everything else >rather than between the @before/@afters and the normal >method. > >Also, why have three things (@before/@after/@around) >instead of just one thing (@around with a next-method >call). Because "around" isn't additive, while before and after are. Any number of before and after methods can be registered for any signature, because they can't directly interfere with one another (since they don't directly call the "next" method. But primary and "around" methods *do* call the next method, so if they are applying any transformation to the arguments or return values, they ordering must be predictable and strict. Thus, methods that can call a next-method (i.e. primaries and arounds) must have a guaranteed unambiguous precedence. Imagine what would happen if the results of calling super() depended on what order your modules had been imported in! Thus, it's an ambiguity error to define two chainable methods for the same type signature. Whereas unchained methods like befores and afters can have as many registrations for the signature as you'd like to include. From talin at acm.org Tue May 15 04:47:32 2007 From: talin at acm.org (Talin) Date: Mon, 14 May 2007 19:47:32 -0700 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: References: Message-ID: <46491F44.9030406@acm.org> Guido van Rossum wrote: > Next, I have a question about the __proceed__ magic argument. I can > see why this is useful, and I can see why having this as a magic > argument is preferable over other solutions (I couldn't come up with a > better solution, and believe me I tried :-). However, I think making > this the *first* argument would upset tools that haven't been taught > about this yet. Is there any problem with making it a keyword argument > with a default of None, by convention to be placed last? I earlier suggested that the __proceed__ functionality be implemented by a differently-named decorator, such as "overload_chained". Phillip objected to this on the basis that it would double the number of decorators. However, I don't think that this is the case, since only a few of the decorators that he has defined supports a __proceed__ argument - certainly 'before' and 'after' don't (since they *all* run), and around has it implicitly. Also, I believe having a separate code path for the two cases would be more efficient when dispatching. > Forgive me if this is mentioned in the PEP, but what happens with > keyword args? Can I invoke an overloaded function with (some) keyword > args, assuming they match the argument names given in the default > implementation? Or are we restricted to positional argument passing > only? (That would be a big step backwards.) > > ****************** > > Also, can we overload different-length signatures (like in C++ or > Java)? This is very common in those languages; while Python typically > uses default argument values, there are use cases that don't easily > fit in that pattern (e.g. the signature of range()). Well, from an algorithmic purity standpoint, I know exactly how it would work: You put all of the overloads, regardless of number of arguments, keywords, defaults, and everything else into a single bin. When you call that function, you search through ever entry in that bin and throw out all the ones that don't fit, then sort the remaining ones by specificity. The problem of course is that I don't know how to build an efficient dispatch table to do that, and I'm not even sure that it's possible. -- Talin From talin at acm.org Tue May 15 04:53:00 2007 From: talin at acm.org (Talin) Date: Mon, 14 May 2007 19:53:00 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <20070510161417.192943A4061@sparrow.telecommunity.com> <464395AB.6040505@canterbury.ac.nz> <20070510231845.9C98C3A4061@sparrow.telecommunity.com> <4643C4F4.30708@canterbury.ac.nz> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> Message-ID: <4649208C.4020801@acm.org> Phillip J. Eby wrote: > Meanwhile, I've been told repeatedly that TurboGears makes extensive > use of RuleDispatch, and my quick look today showed they actually use > a custom method combination, but I haven't yet tracked down where it > gets used, or what the rationale for it is. You want to look at the module TurboJSON: http://docs.turbogears.org/1.0/JsonifyDecorator http://trac.turbogears.org/browser/projects/TurboJson/trunk/turbojson/jsonify.py?rev=2200 -- Talin From mbk.lists at gmail.com Tue May 15 05:34:16 2007 From: mbk.lists at gmail.com (Mike Krell) Date: Mon, 14 May 2007 20:34:16 -0700 Subject: [Python-3000] Support for PEP 3131 (some links to evidence of usage within communities) In-Reply-To: <19dd68ba0705141818w62c942b7g576016fcd3cc0ac1@mail.gmail.com> References: <19dd68ba0705141818w62c942b7g576016fcd3cc0ac1@mail.gmail.com> Message-ID: > One example they give is > > ------------------------------------------- > i = revised(i); > ??? > i = RevisedByMarubatuMethod(i); > ??? > i = revised_by_marubatu_method(i); > > ??????? > > i = ????????????(i); > ------------------------------------------- > And of course think the last one is the best..... What, the one with all the question marks? :-) Sorry, couldn't resist. Mike From guido at python.org Tue May 15 06:37:27 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 21:37:27 -0700 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: <46491F44.9030406@acm.org> References: <46491F44.9030406@acm.org> Message-ID: On 5/14/07, Talin wrote: > Guido van Rossum wrote: > > Also, can we overload different-length signatures (like in C++ or > > Java)? This is very common in those languages; while Python typically > > uses default argument values, there are use cases that don't easily > > fit in that pattern (e.g. the signature of range()). > > Well, from an algorithmic purity standpoint, I know exactly how it would > work: You put all of the overloads, regardless of number of arguments, > keywords, defaults, and everything else into a single bin. When you call > that function, you search through ever entry in that bin and throw out > all the ones that don't fit, then sort the remaining ones by specificity. > > The problem of course is that I don't know how to build an efficient > dispatch table to do that, and I'm not even sure that it's possible. Have a look at sandbox/abc/abc.py, class overloadable (if you don't want to set up a svn workspace, see http://svn.python.org/view/sandbox/trunk/abc/abc.py). It doesn't handle keyword args or defaults, but it does handle positional argument lists of different sizes efficiently, by using a cache indexed with a tuple of the argument types. The first time a particular combination of argument types is seen it does an exhaustive search; the result is then cached. Performance is good assuming there are many calls but few distinct call signatures, per overloaded function. (At least, I think it's efficient; I once timed an earlier implementation of the same idea, and it wasn't too bad. That code is still in sandbox/overload/.) Argument default values could be added relatively easily by treating a function with a default argument value as multiple signatures; e.g. @foo.overload def _(a:str, b:int=1, c:Point=None): ... would register these three signatures: (str,) (str, int) (str, int, Point) Phillip suggested a clever idea to deal with keyword arguments, by compieing a synthesized function that has the expected signature and calls the dispatch machinery. I think it would need some adjustment to deal with variable-length signatures too, but I think it could be made to work as long as the problem isn't fundamentally ambiguous (which it may be when you combine different-sides positional signatures with defaults *and* keywords). The synthetic fuction is just a speed hack; the same thing can be done without synthesizing code, at the cost (considerable, and repeated per call) of decoding *args and **kwds explicitly. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue May 15 06:43:32 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 14 May 2007 21:43:32 -0700 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: <20070515014338.8D8EA3A4036@sparrow.telecommunity.com> References: <20070514192423.624D63A4036@sparrow.telecommunity.com> <20070514214915.C361C3A4036@sparrow.telecommunity.com> <20070514232017.BA6A43A4036@sparrow.telecommunity.com> <20070515003354.194B83A4036@sparrow.telecommunity.com> <20070515014338.8D8EA3A4036@sparrow.telecommunity.com> Message-ID: On 5/14/07, Phillip J. Eby wrote: > Or perhaps we could just say that if the main function is defined > with *args, we treat those arguments as positional? i.e.: > > @abstract > def range(*args): > """This just defines the signature; no implementation here""" That sounds about right. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From gproux at gmail.com Tue May 15 01:46:17 2007 From: gproux at gmail.com (Guillaume Proux) Date: Tue, 15 May 2007 08:46:17 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <20070514125306.8563.JCARLSON@uci.edu> References: <20070514093643.8559.JCARLSON@uci.edu> <20070514125306.8563.JCARLSON@uci.edu> Message-ID: <19dd68ba0705141646j67423b20m87d9e752913830ef@mail.gmail.com> Hello, On 5/15/07, Josiah Carlson wrote: > and comment their code ;). It would be nice to be able to find more > examples in Java. I believe that a lot of people do not know that you can use most Unicode characters in Java identifiers. I did not know myself until this discussion. Furthermore, to find some examples of those, you would have to find in the native language of each speaker. > I guess the question is whether the potential for community > fragmentation is worth trying to handle a (seemingly much) smaller set > of use-cases than is (already arguably sufficiently) handled with ascii > identifiers. Which fragmentation? People who can write ascii-restricted Python will not go away and form their own little sect. One of the big issue of this debate here in English is that the people who have a real stake (the one who do not master English) cannot join in the discussion. I was trying to let you know of my experience with native Japanese speaker that have no or little English capability but people do not want to listen. Regards, Guillaume From ncoghlan at gmail.com Tue May 15 11:28:54 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 May 2007 19:28:54 +1000 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: References: <20070514192423.624D63A4036@sparrow.telecommunity.com> <20070514214915.C361C3A4036@sparrow.telecommunity.com> <20070514232017.BA6A43A4036@sparrow.telecommunity.com> <20070515003354.194B83A4036@sparrow.telecommunity.com> Message-ID: <46497D56.3020101@gmail.com> Guido van Rossum wrote: > On 5/14/07, Phillip J. Eby wrote: >> More importantly, it seems to go against the grain of at least my >> mental concept of Python call signatures, in which arguments are >> inherently *named* (and can be passed using explicit names), with >> only rare exceptions like range(). In contrast, the languages that >> have this sort of positional thing only allow arguments to be >> specified by position, IIRC. That's what makes me uncomfortable with it. > > Well, in *my* metnal model the argument names are just as often > irrelevant as they are useful. I'd be taken aback if I saw this in > someone's code: open(filename="/etc/passwd", mode="r"). Perhaps it's > too bad that Python cannot express the notion of "these parameters are > positional-only" except very clumsily. The idea of positional-only arguments came up during the PEP 3102 discussions. I believe the proposal was to allow a tuple of annotated names instead of a single name for the varargs parameter: @overloadable def range(*(start:int, stop:int, step:int)): ... # implement xrange @range.overload def range(*(stop:int,)): return range(0, x, 1) @range.overload def range(*(start:int, stop:int)): return range(x, y, 1) PJE's approach (using *args in the base signature, but allowing overloads to omit it) is probably cleaner, though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From greg.ewing at canterbury.ac.nz Tue May 15 13:20:10 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 15 May 2007 23:20:10 +1200 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: <20070515014338.8D8EA3A4036@sparrow.telecommunity.com> References: <20070514192423.624D63A4036@sparrow.telecommunity.com> <20070514214915.C361C3A4036@sparrow.telecommunity.com> <20070514232017.BA6A43A4036@sparrow.telecommunity.com> <20070515003354.194B83A4036@sparrow.telecommunity.com> <20070515014338.8D8EA3A4036@sparrow.telecommunity.com> Message-ID: <4649976A.1030301@canterbury.ac.nz> Phillip J. Eby wrote: > C++ and Java don't have tuples, do they? No, but in C++ you could probably do something clever by overloading the comma operator if you were feeling perverse enough... -- Greg From greg.ewing at canterbury.ac.nz Tue May 15 13:25:36 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 15 May 2007 23:25:36 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070515015454.4F2563A4036@sparrow.telecommunity.com> References: <5.1.1.6.0.20070430184628.02c9b280@sparrow.telecommunity.com> <79990c6b0705110140u59c6d46euddc59b919f55b4e8@mail.gmail.com> <79990c6b0705140700x11409b5eje2305d5baca09b62@mail.gmail.com> <20070514163231.275CE3A4036@sparrow.telecommunity.com> <20070514193852.7BE1E3A4036@sparrow.telecommunity.com> <20070514223422.9CEF23A4036@sparrow.telecommunity.com> <20070515002213.532623A4036@sparrow.telecommunity.com> <46490FFB.9050904@canterbury.ac.nz> <20070515015454.4F2563A4036@sparrow.telecommunity.com> Message-ID: <464998B0.6020906@canterbury.ac.nz> Phillip J. Eby wrote: > Imagine what would happen if the results of > calling super() depended on what order your modules had been imported in! Actually, something like this does happen with super. You can't be sure which method super() will call when you write it, because it depends on what other classes people inherit along with your class, and what order they're in. -- Greg From ajm at flonidan.dk Tue May 15 13:50:18 2007 From: ajm at flonidan.dk (Anders J. Munch) Date: Tue, 15 May 2007 13:50:18 +0200 Subject: [Python-3000] Support for PEP 3131 Message-ID: <9B1795C95533CA46A83BA1EAD4B01030031F92@flonidanmail.flonidan.net> tomer filiba wrote: > > once we have chinese, french and hindi function names, i'd be very > difficult to interoperate with third party libs. imagine i wrote my > code using twisted-he, while my client has installed twisted-fr... > kaboom? Indeed if the authors of twisted suddenly go insane and decide to produce multiple, incompatible versions, that would be bad. If you're afraid of that happening, rather than argue over PEP 3131, you should give Matthew Lefkowitz a big hug and buy him a beer. The Java community isn't fragmented by language barriers despite having had Unicode identifiers from the onset. There's no reason to think this will make the Python community spontaneously self-destruct either. - Anders From tanzer at swing.co.at Tue May 15 14:53:51 2007 From: tanzer at swing.co.at (Christian Tanzer) Date: Tue, 15 May 2007 14:53:51 +0200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: Your message of "Tue, 15 May 2007 23:25:36 +1200." <464998B0.6020906@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Phillip J. Eby wrote: > > Imagine what would happen if the results of > > calling super() depended on what order your modules had been imported in! > > Actually, something like this does happen with super. No, it doesn't. The order of super-calls is always well-defined (and the only sane one)! > You can't be sure which method super() will call when > you write it, because it depends on what other classes > people inherit along with your class, and what order > they're in. This is true but doesn't matter (which is the beauty of super). -- Christian Tanzer http://www.c-tanzer.at/ From pje at telecommunity.com Tue May 15 16:52:28 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 15 May 2007 10:52:28 -0400 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: <46491F44.9030406@acm.org> References: <46491F44.9030406@acm.org> Message-ID: <20070515145039.913B73A40A9@sparrow.telecommunity.com> At 07:47 PM 5/14/2007 -0700, Talin wrote: >Guido van Rossum wrote: >>Next, I have a question about the __proceed__ magic argument. I can >>see why this is useful, and I can see why having this as a magic >>argument is preferable over other solutions (I couldn't come up with a >>better solution, and believe me I tried :-). However, I think making >>this the *first* argument would upset tools that haven't been taught >>about this yet. Is there any problem with making it a keyword argument >>with a default of None, by convention to be placed last? > >I earlier suggested that the __proceed__ functionality be >implemented by a differently-named decorator, such as "overload_chained". > >Phillip objected to this on the basis that it would double the >number of decorators. However, I don't think that this is the case, >since only a few of the decorators that he has defined supports a >__proceed__ argument - certainly 'before' and 'after' don't (since >they *all* run), and around has it implicitly. > >Also, I believe having a separate code path for the two cases would >be more efficient when dispatching. This isn't so. Method combination only takes place when a particular combination of arguments hasn't been seen before, and the result of combination is a single object. That object can be a bound method chain, which is *very* efficient. In fact, CPython invokes bound methods almost as quickly as plain functions, as it has a C-level check for them. In any case, if a method does not have a next-method argument, the resulting "combined" method is just the function object, which is called directly. (PEAK-Rules, btw, doesn't incorporate this bound-method-or-function optimization at the moment, but it's built into RuleDispatch and is pretty darn trivial.) >The problem of course is that I don't know how to build an efficient >dispatch table to do that, and I'm not even sure that it's possible. Oh, it's possible all right. The only tricky bit with the proposal under discussion is that I need to know the maximum arity (number of arguments) that the function is ever dispatched on, in order to build a dispatch tuple of the correct length. Missing arguments get a "missing" class put in the corresponding tuple position. The rest works just like normal type-tuple dispatching, so it's really not that complex. From pje at telecommunity.com Tue May 15 17:07:44 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 15 May 2007 11:07:44 -0400 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: References: <20070514192423.624D63A4036@sparrow.telecommunity.com> <20070514214915.C361C3A4036@sparrow.telecommunity.com> <20070514232017.BA6A43A4036@sparrow.telecommunity.com> <20070515003354.194B83A4036@sparrow.telecommunity.com> <20070515014338.8D8EA3A4036@sparrow.telecommunity.com> Message-ID: <20070515150556.9CE063A40A7@sparrow.telecommunity.com> At 09:43 PM 5/14/2007 -0700, Guido van Rossum wrote: >On 5/14/07, Phillip J. Eby wrote: > > Or perhaps we could just say that if the main function is defined > > with *args, we treat those arguments as positional? i.e.: > > > > @abstract > > def range(*args): > > """This just defines the signature; no implementation here""" > >That sounds about right. After thinking about the implementation some more, I believe it'll be necessary to know *in advance* the maximum size of *args that will be used by any subsequent overload, in order to both generate the correct code for the main function (which must construct a fixed-size lookup tuple containing special values for not-supplied arguments), and the correct type tuples for individual overloads (which must contain similar special values for the to-be-omitted arguments). So, if we could do something like this: @abstract def range(*args:3): ... then that would be best. I propose, therefore, that we require an integer annotation on the *args to enable positional dispatching. If there are more *args at call time than this defined amount, only methods that have more positional arguments (or a *args) will be selected. If the number is omitted (e.g. just *args with no annotation), the *args will not be used for method selection. Still good? From jjb5 at cornell.edu Tue May 15 17:31:05 2007 From: jjb5 at cornell.edu (Joel Bender) Date: Tue, 15 May 2007 11:31:05 -0400 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: <20070515150556.9CE063A40A7@sparrow.telecommunity.com> References: <20070514192423.624D63A4036@sparrow.telecommunity.com> <20070514214915.C361C3A4036@sparrow.telecommunity.com> <20070514232017.BA6A43A4036@sparrow.telecommunity.com> <20070515003354.194B83A4036@sparrow.telecommunity.com> <20070515014338.8D8EA3A4036@sparrow.telecommunity.com> <20070515150556.9CE063A40A7@sparrow.telecommunity.com> Message-ID: <4649D239.5030009@cornell.edu> > @abstract > def range(*args:3): > ... > > then that would be best. I propose, therefore, that we require an > integer annotation on the *args to enable positional dispatching. I thought there was already a proposal to do something like this: @abstract def range(x, y, z, *): ... So there was a specific flag that there are no more positional arguments. Even in an abstract function definition they would at least be labeled, which is a good thing. Joel From guido at python.org Tue May 15 17:32:02 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 15 May 2007 08:32:02 -0700 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: <20070515150556.9CE063A40A7@sparrow.telecommunity.com> References: <20070514214915.C361C3A4036@sparrow.telecommunity.com> <20070514232017.BA6A43A4036@sparrow.telecommunity.com> <20070515003354.194B83A4036@sparrow.telecommunity.com> <20070515014338.8D8EA3A4036@sparrow.telecommunity.com> <20070515150556.9CE063A40A7@sparrow.telecommunity.com> Message-ID: On 5/15/07, Phillip J. Eby wrote: > At 09:43 PM 5/14/2007 -0700, Guido van Rossum wrote: > >On 5/14/07, Phillip J. Eby wrote: > > > Or perhaps we could just say that if the main function is defined > > > with *args, we treat those arguments as positional? i.e.: > > > > > > @abstract > > > def range(*args): > > > """This just defines the signature; no implementation here""" > > > >That sounds about right. > > After thinking about the implementation some more, I believe it'll be > necessary to know *in advance* the maximum size of *args that will be > used by any subsequent overload, in order to both generate the > correct code for the main function (which must construct a fixed-size > lookup tuple containing special values for not-supplied arguments), > and the correct type tuples for individual overloads (which must > contain similar special values for the to-be-omitted arguments). > > So, if we could do something like this: > > @abstract > def range(*args:3): > ... > > then that would be best. I propose, therefore, that we require an > integer annotation on the *args to enable positional dispatching. > > If there are more *args at call time than this defined amount, only > methods that have more positional arguments (or a *args) will be selected. > > If the number is omitted (e.g. just *args with no annotation), the > *args will not be used for method selection. > > Still good? Not so good; I expect the overloads could be written by different authors or at least at different times. Why can't you dynamically update the dispatcher when an overloading with more arguments comes along? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Tue May 15 18:25:24 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 15 May 2007 12:25:24 -0400 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: References: <20070514214915.C361C3A4036@sparrow.telecommunity.com> <20070514232017.BA6A43A4036@sparrow.telecommunity.com> <20070515003354.194B83A4036@sparrow.telecommunity.com> <20070515014338.8D8EA3A4036@sparrow.telecommunity.com> <20070515150556.9CE063A40A7@sparrow.telecommunity.com> Message-ID: <20070515162336.AD9A23A4036@sparrow.telecommunity.com> At 08:32 AM 5/15/2007 -0700, Guido van Rossum wrote: >Not so good; I expect the overloads could be written by different >authors or at least at different times. Why can't you dynamically >update the dispatcher when an overloading with more arguments comes >along? You mean by changing its __code__? The code to generate the tuple goes in the original function object generated by @abstract or @overloadable. If we can't specify the count in advance, the remaining choices appear to be: * Require *args to be annotated :overloadable in order to enable dispatching on them, or * Only enable *args dispatching if the original function has no explicit positional arguments * Mutate the function Of these, I lean towards the third, but I imagine you'll like one of the other two better. :) If we don't do one of these things, the performance of functions that have *args but don't want to dispatch on them will suffer enormously due to the need to loop over *args and create a dynamic-length tuple. (As shown by the performance tests you did on your tuple dispatch prototype.) Conversely, if we mutate the function, then even dispatching over *args won't require a loop slowdown; the tuple can *always* be of a fixed length. From guido at python.org Tue May 15 18:40:28 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 15 May 2007 09:40:28 -0700 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: <20070515162336.AD9A23A4036@sparrow.telecommunity.com> References: <20070514232017.BA6A43A4036@sparrow.telecommunity.com> <20070515003354.194B83A4036@sparrow.telecommunity.com> <20070515014338.8D8EA3A4036@sparrow.telecommunity.com> <20070515150556.9CE063A40A7@sparrow.telecommunity.com> <20070515162336.AD9A23A4036@sparrow.telecommunity.com> Message-ID: On 5/15/07, Phillip J. Eby wrote: > At 08:32 AM 5/15/2007 -0700, Guido van Rossum wrote: > >Not so good; I expect the overloads could be written by different > >authors or at least at different times. Why can't you dynamically > >update the dispatcher when an overloading with more arguments comes > >along? > > You mean by changing its __code__? The code to generate the tuple > goes in the original function object generated by @abstract or @overloadable. > > If we can't specify the count in advance, the remaining choices appear to be: > > * Require *args to be annotated :overloadable in order to enable > dispatching on them, or > > * Only enable *args dispatching if the original function has no > explicit positional arguments > > * Mutate the function > > Of these, I lean towards the third, but I imagine you'll like one of > the other two better. :) > > If we don't do one of these things, the performance of functions that > have *args but don't want to dispatch on them will suffer enormously > due to the need to loop over *args and create a dynamic-length > tuple. (As shown by the performance tests you did on your tuple > dispatch prototype.) > > Conversely, if we mutate the function, then even dispatching over > *args won't require a loop slowdown; the tuple can *always* be of a > fixed length. It looks like you're focused ion an implementation that is both highly optimized and (technically) pure Python (using every trick in the book). Personally I would rather go for a a slower but simpler pure Python implementation and eventually add C support to speed it up; that way the Python version can maintain more of the advantages of Python code like readability, maintainability, and evolvability (just in case we don't get it perfect on the first try -- even knowing that it's more like the 3rd try for you ;-). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Tue May 15 19:41:53 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 15 May 2007 13:41:53 -0400 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: References: <20070514232017.BA6A43A4036@sparrow.telecommunity.com> <20070515003354.194B83A4036@sparrow.telecommunity.com> <20070515014338.8D8EA3A4036@sparrow.telecommunity.com> <20070515150556.9CE063A40A7@sparrow.telecommunity.com> <20070515162336.AD9A23A4036@sparrow.telecommunity.com> Message-ID: <20070515174006.49A063A4036@sparrow.telecommunity.com> At 09:40 AM 5/15/2007 -0700, Guido van Rossum wrote: >It looks like you're focused ion an implementation that is both highly >optimized and (technically) pure Python (using every trick in the >book). Personally I would rather go for a a slower but simpler pure >Python implementation Actually, the need to handle keyword arguments intelligently pretty much demands that you use a function object as a front-end. It's a *lot* easier to compile a function that does the right thing for one specific signature, than it is to write a single routine that interprets arbitrary function signatures correctly! (Note that the "inspect" module does all the heavy lifting, including the formatting of all the argument strings. It even supports nested tuple arguments, though of course we won't be needing those for Py3K!) IOW, I originally started using functions as front-ends in RuleDispatch to support keyword arguments correctly, not to improve performance. It actually slows things down a little in RuleDispatch to do that, because it adds an extra calling level. But it's a correctness thing, not a performance thing. Anyway, I've figured out at least one way to handle *args efficiently, without any pre-declaration, by modifying the function template slightly when *args are in play: def make_function(__defaults, __lookup, __starcount): def $funcname(..., *args, ...): if args and __starcount: # code to make a type tuple using args[:__starcount] else: # fast code that doesn't use __starcount def __setcount(count): nonlocal __starcount __starcount = count return $funcname, __setcount This avoids any need to mutate the function later; instead, the dispatch engine can just call __setcount() when it encounters signatures that dispatch on the contents of *args. So, I think this will do everything you wanted. From collinw at gmail.com Wed May 16 00:30:13 2007 From: collinw at gmail.com (Collin Winter) Date: Tue, 15 May 2007 15:30:13 -0700 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> <43aa6ff70705141303t63520c44g8f88f8ae56732137@mail.gmail.com> Message-ID: <43aa6ff70705151530m1f414acdlc4b70383ab2471d5@mail.gmail.com> On 5/14/07, Steven Bethard wrote: > On 5/14/07, Collin Winter wrote: > > There really is no difference between roles and all- at abstractmethod > > ABCs. From my point of view, though, roles win because they don't > > require any changes to the interpreter; they're a much simpler way of > > expressing the same concept. > > Ok, you clearly have an implementation in mind, but I don't know what > it is. As far as I can tell: > > * metaclass=Role ~ metaclass=ABCMeta, except that all methods must be abstract > * perform_role(role)(cls) ~ role.register(cls) > * performs(obj, role) ~ isinstance(obj, role) > > And so, as far as I can see, without an Implementation section, all > you're propsing is a different syntax for the same functionality. Was > there a discussion of your implementation that I missed? > > > You may like adding the extra complexity > > and indirection to the VM necessary to support > > issubclass()/isinstance() overriding, but I don't. > > Have you looked at Guido's issubclass()/isinstance() patch > (http://bugs.python.org/1708353)? I'd hardly say that 34 lines of C > code is substantial "extra complexity". This is what I don't understand: ABCs require changing the VM, roles don't; all that change buys you is the ability to spell "performs()" as "isinstance()". Why are ABCs preferable, again? Collin Winter From guido at python.org Wed May 16 00:39:59 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 15 May 2007 15:39:59 -0700 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: <43aa6ff70705151530m1f414acdlc4b70383ab2471d5@mail.gmail.com> References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> <43aa6ff70705141303t63520c44g8f88f8ae56732137@mail.gmail.com> <43aa6ff70705151530m1f414acdlc4b70383ab2471d5@mail.gmail.com> Message-ID: On 5/15/07, Collin Winter wrote: > This is what I don't understand: ABCs require changing the VM, roles > don't; all that change buys you is the ability to spell "performs()" > as "isinstance()". Why are ABCs preferable, again? Actually, if you didn't care about overloading isinstance(), you could have everything else in PEP 3119 by using a different spelling than isinstance(). Suppose the playing field were to be leveled like this -- IMO ABCs would still be preferable because they can *also* be subclassed directly and provide concrete or partially-implemented methods, acting as mix-in classes. You can also turn it around. If Roles were overloading isinstance() -- how would they be better than ABCs? But I *like* overloading isinstance(), because it means there's less to learn, and so does Phillip -- it means there can be a uniform way for the GF machinery to talk about the relationships between instances and the various things that can be used as argument annotations (even zope.interfaces could overload isinstance() to do the right thing). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Wed May 16 01:43:53 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Tue, 15 May 2007 17:43:53 -0600 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: <43aa6ff70705151530m1f414acdlc4b70383ab2471d5@mail.gmail.com> References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> <43aa6ff70705141303t63520c44g8f88f8ae56732137@mail.gmail.com> <43aa6ff70705151530m1f414acdlc4b70383ab2471d5@mail.gmail.com> Message-ID: On 5/15/07, Collin Winter wrote: > On 5/14/07, Steven Bethard wrote: > > On 5/14/07, Collin Winter wrote: > > > You may like adding the extra complexity > > > and indirection to the VM necessary to support > > > issubclass()/isinstance() overriding, but I don't. > > > > Have you looked at Guido's issubclass()/isinstance() patch > > (http://bugs.python.org/1708353)? I'd hardly say that 34 lines of C > > code is substantial "extra complexity". > > This is what I don't understand: ABCs require changing the VM, roles > don't; all that change buys you is the ability to spell "performs()" > as "isinstance()". Sorry, I can't really respond to this until you give me some idea what your implementation is. You keep saying that roles don't require changing the VM, but I don't know what they *do* involve changing. So I can't judge how different that is from allowing isinstance() to be overloaded. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From greg.ewing at canterbury.ac.nz Wed May 16 02:19:29 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 16 May 2007 12:19:29 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: References: Message-ID: <464A4E11.9030107@canterbury.ac.nz> Christian Tanzer wrote: > Greg Ewing wrote: > > > Phillip J. Eby wrote: > > > > > Imagine what would happen if the results of > > > calling super() depended on what order your modules had been imported in! > > > > Actually, something like this does happen with super. > > This is true but doesn't matter (which is the beauty of super). Only because super methods are written with this knowledge in mind, however. Seems to me you ought to have something similar in mind when overloading a generic function. -- Greg From collinw at gmail.com Wed May 16 02:38:42 2007 From: collinw at gmail.com (Collin Winter) Date: Tue, 15 May 2007 17:38:42 -0700 Subject: [Python-3000] Revised PEP 3119 (Abstract Base Classes) In-Reply-To: References: Message-ID: <43aa6ff70705151738x715c4616pbfaf1b085ffda1fa@mail.gmail.com> On 5/11/07, Guido van Rossum wrote: > - Overloading isinstance and issubclass is now a key mechanism rather > than an afterthought; it is also the only change to C code required > > - Built-in (and user-defined) types can be registered as "virtual > subclasses" (not related to virtual base classes in C++) of the > standard ABCs, e.g. Sequence.register(tuple) makes issubclass(tuple, > Sequence) true (but Sequence won't show up in __bases__ or __mro__). (The bit about "issubclass(tuple, Sequence)" currently isn't true with the sandbox prototype, but let's assume that it is/will be.) Given: class MyABC(metaclass=ABCMeta): def foo(self): # A concrete method return 5 class MyClass(MyABC): # Mark as implementing the ABC's interface pass >>> a = MyClass() >>> isinstance(a, MyABC) True # Good, I can call foo() >>> a.foo() 5 >>> MyABC.register(list) >>> isinstance([], MyABC) True # Good, I can call foo() >>> [].foo() Traceback (most recent call last): AttributeError: 'list' object has no attribute 'foo' Have I missed something? It would seem that when dealing with ABCs that provide concrete methods, "isinstance(x, SomeABC) == True" is useless. Collin Winter From pje at telecommunity.com Wed May 16 02:41:38 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 15 May 2007 20:41:38 -0400 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <464A4E11.9030107@canterbury.ac.nz> References: <464A4E11.9030107@canterbury.ac.nz> Message-ID: <20070516003949.BE0733A4036@sparrow.telecommunity.com> At 12:19 PM 5/16/2007 +1200, Greg Ewing wrote: >Christian Tanzer wrote: > > Greg Ewing wrote: > > > > > Phillip J. Eby wrote: > > > > > > > Imagine what would happen if the results of > > > > calling super() depended on what order your modules had been > imported in! > > > > > > Actually, something like this does happen with super. > > > > This is true but doesn't matter (which is the beauty of super). > >Only because super methods are written with this >knowledge in mind, however. Seems to me you ought >to have something similar in mind when overloading >a generic function. This discussion has wandered away from the point. Next-method calls are in fact identical to super calls in the degenerate case of specializing only on the first argument. However, the point of before/after methods is that they don't follow this pattern. If only one person is writing all the methods in a generic function, they don't have much benefit from using before/after methods, because they could just code the desired behavior into the primary methods. The benefit of before/after, on the other hand, is that they allow any number of developers to "hook" the calling of the function. Any given developer can predict the calling order for *their* before and after methods, but does not necessarily know when other developers' before/after methods might be called. If you and I both define a @before(foo,(X,Y)) method, there is no way for either of us to predict whose method will be called first, even though we can each predict that our own (X,Y) method will be called before an (object,object) method that we also registered. Our methods are in parallel universes that do not overlap. This is the driving force for having before and after methods: allowing independent hooks to be registered, while ensuring that they can't mess anything up (as long as they stick to their own business). To put it another way, if you care about the *overall* (absolute?) order, you have to use primary or "around" methods. If you only care about the *relative* order, you want a before or after method. The fact that you do NOT have explicit control over the chaining is the very thing that makes them able to be independent. From greg.ewing at canterbury.ac.nz Wed May 16 03:01:19 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 16 May 2007 13:01:19 +1200 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <20070516003949.BE0733A4036@sparrow.telecommunity.com> References: <464A4E11.9030107@canterbury.ac.nz> <20070516003949.BE0733A4036@sparrow.telecommunity.com> Message-ID: <464A57DF.90002@canterbury.ac.nz> Phillip J. Eby wrote: > This is the driving force for having before and after methods: allowing > independent hooks to be registered, while ensuring that they can't mess > anything up (as long as they stick to their own business). Some discipline is still required to make sure they do stick to their business. The same result could be achieved with only one order-independent decorator instead of two, and an additional discipline of always calling the next method. -- Greg From guido at python.org Wed May 16 03:25:48 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 15 May 2007 18:25:48 -0700 Subject: [Python-3000] PEP 3124 - Overloading, Generic Functions, Interfaces, etc. In-Reply-To: <464A4E11.9030107@canterbury.ac.nz> References: <464A4E11.9030107@canterbury.ac.nz> Message-ID: Note that Phillip's hypothetical was about it depending on *the order in which modules are imported*. Super has no such dependency -- it just depends on the inheritance graph, which is much more well-defined. --Guido On 5/15/07, Greg Ewing wrote: > Christian Tanzer wrote: > > Greg Ewing wrote: > > > > > Phillip J. Eby wrote: > > > > > > > Imagine what would happen if the results of > > > > calling super() depended on what order your modules had been imported in! > > > > > > Actually, something like this does happen with super. > > > > This is true but doesn't matter (which is the beauty of super). > > Only because super methods are written with this > knowledge in mind, however. Seems to me you ought > to have something similar in mind when overloading > a generic function. > > -- > Greg > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed May 16 03:34:52 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 15 May 2007 18:34:52 -0700 Subject: [Python-3000] Revised PEP 3119 (Abstract Base Classes) In-Reply-To: <43aa6ff70705151738x715c4616pbfaf1b085ffda1fa@mail.gmail.com> References: <43aa6ff70705151738x715c4616pbfaf1b085ffda1fa@mail.gmail.com> Message-ID: On 5/15/07, Collin Winter wrote: > On 5/11/07, Guido van Rossum wrote: > > - Overloading isinstance and issubclass is now a key mechanism rather > > than an afterthought; it is also the only change to C code required > > > > - Built-in (and user-defined) types can be registered as "virtual > > subclasses" (not related to virtual base classes in C++) of the > > standard ABCs, e.g. Sequence.register(tuple) makes issubclass(tuple, > > Sequence) true (but Sequence won't show up in __bases__ or __mro__). > > (The bit about "issubclass(tuple, Sequence)" currently isn't true with > the sandbox prototype, but let's assume that it is/will be.) Perhaps you tried it without the patch (reference [12] from PEP 3119) applied? It works for me: guido at pythonic:abc$ python3.0 Python 3.0x (p3yk, May 10 2007, 17:05:42) [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import abc >>> isinstance((), abc.Sequence) True >>> > Given: > > class MyABC(metaclass=ABCMeta): > def foo(self): # A concrete method > return 5 > > class MyClass(MyABC): # Mark as implementing the ABC's interface > pass > > >>> a = MyClass() > >>> isinstance(a, MyABC) > True # Good, I can call foo() > >>> a.foo() > 5 > > >>> MyABC.register(list) > >>> isinstance([], MyABC) > True # Good, I can call foo() > >>> [].foo() > Traceback (most recent call last): > AttributeError: 'list' object has no attribute 'foo' > > Have I missed something? It would seem that when dealing with ABCs > that provide concrete methods, "isinstance(x, SomeABC) == True" is > useless. The intention is that you shouldn't register such cases. This falls under the consenting-adults rule. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From pje at telecommunity.com Wed May 16 03:54:56 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 15 May 2007 21:54:56 -0400 Subject: [Python-3000] Revised PEP 3119 (Abstract Base Classes) In-Reply-To: References: <43aa6ff70705151738x715c4616pbfaf1b085ffda1fa@mail.gmail.com> Message-ID: <20070516015310.156A53A4036@sparrow.telecommunity.com> At 06:34 PM 5/15/2007 -0700, Guido van Rossum wrote: > > Have I missed something? It would seem that when dealing with ABCs > > that provide concrete methods, "isinstance(x, SomeABC) == True" is > > useless. > >The intention is that you shouldn't register such cases. This falls >under the consenting-adults rule. Not only that, but the presence of the isinstance()/issubclass() hooks actually means that if you want to create your own "Role" or "Interface" types that actually *verify* your requirements, you can do so! From yi.codeplayer at gmail.com Wed May 16 05:06:29 2007 From: yi.codeplayer at gmail.com (=?UTF-8?B?6buE5q+F?=) Date: Wed, 16 May 2007 11:06:29 +0800 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <20070514093643.8559.JCARLSON@uci.edu> References: <4647B15F.7040700@canterbury.ac.nz> <20070514093643.8559.JCARLSON@uci.edu> Message-ID: > > Have you been able to find substantial Java source in which non-ascii > identifiers were used? I have been curious about its prevalence, but > wouldn't even know how to start searching for such code. I've seen many (and written some) java and c# code use chinese identifiers, and yes, most of that kind of code are close source . And i'd like to see this feature in python, because as an english-second-language programmer, it's hard to traslate all the chinese terms in my head to english, sometimes it's even impossible to do, and then i have to make up some strange words based on the pronunciation, and i hate that very much. -- http://codeplayer.blogspot.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070516/ac27a3fa/attachment.htm From guido at python.org Wed May 16 06:46:48 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 15 May 2007 21:46:48 -0700 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: <4648D626.1030201@benjiyork.com> References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> <4648D626.1030201@benjiyork.com> Message-ID: On 5/14/07, Benji York wrote: > Collin Winter wrote: > > PEP: 3133 > > Title: Introducing Roles > > Everything included here is included in zope.interface. See in-line > comments below for the analogs. Could you look at PEP 3119 and do a similar analysis? I expect that the main thing missing there is that it (currently) has no way to claim that a particular *object* has a certain behavior. The overloading of isinstance() makes it possible to add this however -- if not as part of that PEP, then as part of a revamping of zope.interface using isinterface()/issubclass() overloading and PEP 3129 style class decorators. PEP 3119 currently also doesn't have a verification step -- but this could easily be added as an (optional) part of the registration call. If this is confirmed, I like the convergence that this suggests -- if several designs (ABCs, Roles and zope.interface) mostly map onto each other, we're probably on to an important concept, even if we can quibble over the spelling of behavior checks and other details. It also all appears to dovetail nicely with GFs. BTW I think Collin made a mistake when he claimed that the Doglike role should throw a tantrum just because the actual bark() implementation has an optional extra argument; that would be like complaining that it also has a poop() method which is not part of the Doglike role. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From gproux+py3000 at gmail.com Wed May 16 07:13:16 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Wed, 16 May 2007 14:13:16 +0900 Subject: [Python-3000] Support for PEP 3131 - discussion on python zope users group Message-ID: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> Hello, Just to let you know that a discussion on japanese python users group is going on regarding this issue. Most people feel like the PEP3131 would be a welcome addition. -> some people point out the fact that special characters like the greek letters would be great for all kind of maths calculation. -> Many people think that this would enable them to make their own DSL -> unittest - very useful to give a better overview of the result of unit test. People pointed us at a Visual C# MVP tutorial http://www.atmarkit.co.jp/fdotnet/nagile/nagile02/nagile02_03.html One person expressed the worry that mixing japanese and ascii would oblige them to change input mode too often but other posters said that this could be easily arranged by putting the right settings in the IME. Guillaume From ntoronto at cs.byu.edu Wed May 16 07:18:47 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Tue, 15 May 2007 23:18:47 -0600 Subject: [Python-3000] Support for PEP 3131 - discussion on python zope users group In-Reply-To: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> References: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> Message-ID: <464A9437.6030701@cs.byu.edu> Guillaume Proux wrote: > Hello, > > Just to let you know that a discussion on japanese python users group > is going on regarding this issue. > > Most people feel like the PEP3131 would be a welcome addition. > -> some people point out the fact that special characters like the > greek letters would be great for all kind of maths calculation. It could be nice for reading, for those who know the Greek alphabet. (Those who don't would see every Greek letter as just a squiggle.) Writing, though? I don't have a clue how to type Greek letters, so I'd end up copy-and-pasting variable names. Icky. Neil From gproux+py3000 at gmail.com Wed May 16 07:46:40 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Wed, 16 May 2007 14:46:40 +0900 Subject: [Python-3000] Support for PEP 3131 - discussion on python zope users group In-Reply-To: <464A9437.6030701@cs.byu.edu> References: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> <464A9437.6030701@cs.byu.edu> Message-ID: <19dd68ba0705152246n51d268acpcc90710157a6bca3@mail.gmail.com> One of the big advantage of japanese Input Methods. They can be extended easily to fit your need. I can type "siguma" on my laptop here and windows (same in Linux) gives me the following choices ??? ??? ? ? cute no? Guillaume On 5/16/07, Neil Toronto wrote: > Guillaume Proux wrote: > > Hello, > > > > Just to let you know that a discussion on japanese python users group > > is going on regarding this issue. > > > > Most people feel like the PEP3131 would be a welcome addition. > > -> some people point out the fact that special characters like the > > greek letters would be great for all kind of maths calculation. > > It could be nice for reading, for those who know the Greek alphabet. > (Those who don't would see every Greek letter as just a squiggle.) > Writing, though? I don't have a clue how to type Greek letters, so I'd > end up copy-and-pasting variable names. Icky. > > Neil > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/gproux%2Bpy3000%40gmail.com > From pedronis at openendsystems.com Wed May 16 09:35:05 2007 From: pedronis at openendsystems.com (Samuele Pedroni) Date: Wed, 16 May 2007 09:35:05 +0200 Subject: [Python-3000] Revised PEP 3119 (Abstract Base Classes) In-Reply-To: References: Message-ID: <464AB429.5010305@openendsystems.com> Guido van Rossum wrote: > **Open issues:** Conceivably, instead of using the ABCMeta metaclass, > these classes could override ``__instancecheck__`` and > ``__subclasscheck__`` to check for the presence of the applicable > special method; for example:: > > class Sized(metaclass=ABCMeta): > @abstractmethod > def __hash__(self): > return 0 > @classmethod > def __instancecheck__(cls, x): > return hasattr(x, "__len__") > @classmethod > def __subclasscheck__(cls, C): > return hasattr(C, "__bases__") and hasattr(C, "__len__") > > This has the advantage of not requiring explicit registration. > However, the semantics hard to get exactly right given the confusing > semantics of instance attributes vs. class attributes, and that a > class is an instance of its metaclass; the check for ``__bases__`` is > only an approximation of the desired semantics. **Strawman:** Let's > do it, but let's arrange it in such a way that the registration API > also works. > > > just to say that I still think the strawman would be right thing to do. From ncoghlan at gmail.com Wed May 16 12:10:27 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 May 2007 20:10:27 +1000 Subject: [Python-3000] Revised PEP 3119 (Abstract Base Classes) In-Reply-To: <43aa6ff70705151738x715c4616pbfaf1b085ffda1fa@mail.gmail.com> References: <43aa6ff70705151738x715c4616pbfaf1b085ffda1fa@mail.gmail.com> Message-ID: <464AD893.8010702@gmail.com> Collin Winter wrote: >>>> MyABC.register(list) >>>> isinstance([], MyABC) > True # Good, I can call foo() >>>> [].foo() > Traceback (most recent call last): > AttributeError: 'list' object has no attribute 'foo' > > Have I missed something? It would seem that when dealing with ABCs > that provide concrete methods, "isinstance(x, SomeABC) == True" is > useless. You've missed something - the declaration in your example that list is compliant with the example ABC when that is not in fact the case is an out-and-out bug that leads directly to the exception on the last line. I can construct an identical example for PEP 3133: class MyRole(metaclass=Role): def foo(self): # An abstract method pass @performs_role(MyRole) class MyRoleMixin(object): def foo(self): # A concrete method return 5 class MyClass(MyRoleMixin): # Use Mixin to perform the Role pass .>>> a = MyClass() .>>> performs(a, MyRole) True # Good, I can call foo() .>>> a.foo() 5 .>>> performs_role(MyRole)(list) # This assertion is WRONG! .>>> performs([], MyRole) True # Good, I can call foo() .>>> [].foo() Traceback (most recent call last): AttributeError: 'list' object has no attribute 'foo' One of the key things that PEP 3119 does is to permit a single ABC to handle both of the jobs that PEP 3133 assigns to separate Role and Mixin classes. When implementing a PEP 3119 style interface you have two options - inherit from the ABC and benefit from its mixin characteristics, or else do a post-hoc registration and implement everything yourself. Enforcing an explicit Role/Mixin distinction the way that PEP 3133 does just makes the interface developer repeat themselves - once to write the Role and then again to write a Mixin that provides the concrete methods which can be defined entirely in terms of other methods in the interface. After that extra work, the user of the interface still has the same two options - inherit from the Mixin and benefit from the partial implementation, or do the post-hoc registration and full implementation. Equivalent expressiveness and significantly less typing gives me a strong personal preference towards the PEP 3119 approach. The improvements in proxy object support and keeping isinstance() as the one obvious way to introspect interfaces are also nice bonuses. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Wed May 16 16:13:22 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 16 May 2007 07:13:22 -0700 Subject: [Python-3000] Support for PEP 3131 - discussion on python zope users group In-Reply-To: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> References: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> Message-ID: After the warm user testimonials posted in the last few days I am now warming up to this proposal. Hearing about how it has been a positive influence in certain local Java communities was especially useful. --Guido On 5/15/07, Guillaume Proux wrote: > Hello, > > Just to let you know that a discussion on japanese python users group > is going on regarding this issue. > > Most people feel like the PEP3131 would be a welcome addition. > -> some people point out the fact that special characters like the > greek letters would be great for all kind of maths calculation. > -> Many people think that this would enable them to make their own DSL > -> unittest - very useful to give a better overview of the result of > unit test. People pointed us at a Visual C# MVP tutorial > http://www.atmarkit.co.jp/fdotnet/nagile/nagile02/nagile02_03.html > > One person expressed the worry that mixing japanese and ascii would > oblige them to change input mode too often but other posters said that > this could be easily arranged by putting the right settings in the > IME. > > Guillaume > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed May 16 16:48:24 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 16 May 2007 07:48:24 -0700 Subject: [Python-3000] Alternatives for __del__ Message-ID: Since no PEP has been submitted about eliminating __del__, __del__ remains, by default, in Python 3000. I am more comfortable with this anyway. However, I still welcome an informational PEP describing the "best practices" for avoiding it by using weak references, including some support code to be added to weakref.py (this could probably be added to Python 2.6 as well; and for earlier releases it could be made available as a 3rd party add-on or as a "recipe" in the online Python Cookbook (http://aspn.activestate.com/ASPN/Python/Cookbook/). I am hoping that someone besides Raymond will volunteer to write such a PEP; his busy schedule makes it unlikely that he will have the time necessary to devote to this project. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed May 16 17:46:13 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 16 May 2007 08:46:13 -0700 Subject: [Python-3000] Raw strings containing \u or \U Message-ID: Walter Doerwald, in private mail, reminded me of a third use case for raw strings: docstrings containing example code using backslashes. Here it really seems wrong to interpolate \u and \U. So this is swaying me towards changing this behavior: r"\u1234" will be a string of length 6, and r"\U00012345" one of length 10. I'm still on the fence about the trailing backslash; I personally prefer to write Windows paths using regular strings and doubled backslashes. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From collinw at gmail.com Wed May 16 17:55:15 2007 From: collinw at gmail.com (Collin Winter) Date: Wed, 16 May 2007 08:55:15 -0700 Subject: [Python-3000] Support for PEP 3131 - discussion on python zope users group In-Reply-To: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> References: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> Message-ID: <43aa6ff70705160855s8d2edb8k9212455f0696c6f8@mail.gmail.com> On 5/15/07, Guillaume Proux wrote: > Just to let you know that a discussion on japanese python users group > is going on regarding this issue. > > Most people feel like the PEP3131 would be a welcome addition. > -> some people point out the fact that special characters like the > greek letters would be great for all kind of maths calculation. > -> Many people think that this would enable them to make their own DSL Oooh, and we could use actual lambdas instead of the lambda keyword. So now we've made the jump from "help (some) international users" to "I want to use unicode characters just for the hell of it". > -> unittest - very useful to give a better overview of the result of > unit test. People pointed us at a Visual C# MVP tutorial > http://www.atmarkit.co.jp/fdotnet/nagile/nagile02/nagile02_03.html I don't know what "a better overview of the result of unit test" means. Also, the linked page is in Japanese. Collin Winter From collinw at gmail.com Wed May 16 18:04:38 2007 From: collinw at gmail.com (Collin Winter) Date: Wed, 16 May 2007 09:04:38 -0700 Subject: [Python-3000] Support for PEP 3131 (some links to evidence of usage within communities) In-Reply-To: <19dd68ba0705141818w62c942b7g576016fcd3cc0ac1@mail.gmail.com> References: <19dd68ba0705141818w62c942b7g576016fcd3cc0ac1@mail.gmail.com> Message-ID: <43aa6ff70705160904p104962b3nb142a1e08bb68b78@mail.gmail.com> On 5/14/07, Guillaume Proux wrote: > Found some evidence of usage of identifiers in Japanese while doing a > quick google search > > All links below are in Japanese. I have absolutely no way of evaluating the content of these links. Testimonials that I can't read are less than interesting. Collin Winter From murman at gmail.com Wed May 16 18:15:00 2007 From: murman at gmail.com (Michael Urman) Date: Wed, 16 May 2007 11:15:00 -0500 Subject: [Python-3000] Support for PEP 3131 (some links to evidence of usage within communities) In-Reply-To: <43aa6ff70705160904p104962b3nb142a1e08bb68b78@mail.gmail.com> References: <19dd68ba0705141818w62c942b7g576016fcd3cc0ac1@mail.gmail.com> <43aa6ff70705160904p104962b3nb142a1e08bb68b78@mail.gmail.com> Message-ID: On 5/16/07, Collin Winter wrote: > On 5/14/07, Guillaume Proux wrote: > > Found some evidence of usage of identifiers in Japanese while doing a > > quick google search > > > > All links below are in Japanese. > > I have absolutely no way of evaluating the content of these links. > Testimonials that I can't read are less than interesting. See the web page option, Japanese to English BETA: http://translate.google.com/translate_t -- Michael Urman From collinw at gmail.com Wed May 16 18:26:29 2007 From: collinw at gmail.com (Collin Winter) Date: Wed, 16 May 2007 09:26:29 -0700 Subject: [Python-3000] Support for PEP 3131 (some links to evidence of usage within communities) In-Reply-To: <19dd68ba0705160909y47a1eb4cpcabc9d53a8581b6e@mail.gmail.com> References: <19dd68ba0705141818w62c942b7g576016fcd3cc0ac1@mail.gmail.com> <43aa6ff70705160904p104962b3nb142a1e08bb68b78@mail.gmail.com> <19dd68ba0705160909y47a1eb4cpcabc9d53a8581b6e@mail.gmail.com> Message-ID: <43aa6ff70705160926n59aa9c4el9b165b074e172e4c@mail.gmail.com> On 5/16/07, Guillaume Proux wrote: > Hi Collin, > > You express the same frustration than people who can't read English. > You feel the same than Japanese people faced with an impenetrable wall > of English... Presumably people who don't speak English aren't provided English-language reading materials as evidence during an otherwise-Japanese discussion. > > Testimonials that I can't read are less than interesting. > > You seem to be unable to open your mind to other cultures. I speak three languages. It's insulting to allege that my opposition to this proposal is somehow based in English-language cultural imperialism or some other politically-correct nonsense. Collin Winter From guido at python.org Wed May 16 18:39:07 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 16 May 2007 09:39:07 -0700 Subject: [Python-3000] Support for PEP 3131 - discussion on python zope users group In-Reply-To: <43aa6ff70705160855s8d2edb8k9212455f0696c6f8@mail.gmail.com> References: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> <43aa6ff70705160855s8d2edb8k9212455f0696c6f8@mail.gmail.com> Message-ID: On 5/16/07, Collin Winter wrote: > On 5/15/07, Guillaume Proux wrote: > > Just to let you know that a discussion on japanese python users group > > is going on regarding this issue. > > > > Most people feel like the PEP3131 would be a welcome addition. > > -> some people point out the fact that special characters like the > > greek letters would be great for all kind of maths calculation. > > -> Many people think that this would enable them to make their own DSL > > Oooh, and we could use actual lambdas instead of the lambda keyword. Calm down, Collin. You know full well that that is not in the PEP and if it were I'd be the first to reject it. > So now we've made the jump from "help (some) international users" to > "I want to use unicode characters just for the hell of it". Down that road lies Perl 6. We need to give the world a sane alternative. > > -> unittest - very useful to give a better overview of the result of > > unit test. People pointed us at a Visual C# MVP tutorial > > http://www.atmarkit.co.jp/fdotnet/nagile/nagile02/nagile02_03.html > > I don't know what "a better overview of the result of unit test" > means. Also, the linked page is in Japanese. I've just ignored the pages in Japanese, except as proof that there *are* people out there who like to discuss programming in their native language which isn't English. I say more power to them. Just to clarify my position to those who might think I have gone soft: the standard library (with the exception of test modules specifically aimed at testing this feature) should continue to use ASCII exclusively for identifiers, English exclusively for comments and messages, and should limit the use of non-ASCII characters in comments and string literals to the names of contributors. Where names are written using an alphabet that is not the Latin alphabet, a Latin translation should be given alongside. I'd like to see this added to both PEP 3131 and, for good measure, to PEP 8, the style guide (which ought to be self-contained, and has a wider applicability than just the standard library). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jason.orendorff at gmail.com Wed May 16 18:44:50 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Wed, 16 May 2007 12:44:50 -0400 Subject: [Python-3000] Support for PEP 3131 - discussion on python zope users group In-Reply-To: <43aa6ff70705160855s8d2edb8k9212455f0696c6f8@mail.gmail.com> References: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> <43aa6ff70705160855s8d2edb8k9212455f0696c6f8@mail.gmail.com> Message-ID: On 5/16/07, Collin Winter wrote: > > -> unittest - very useful to give a better overview of the result of > > unit test. People pointed us at a Visual C# MVP tutorial > > http://www.atmarkit.co.jp/fdotnet/nagile/nagile02/nagile02_03.html > > I don't know what "a better overview of the result of unit test" > means. Also, the linked page is in Japanese. The page illustrates how a unit test can serve as an executable specification. The third box of code is a TestFixture class with methods like this one: >> [Test][ExpectedException(typeof(ArgumentException))] >> public void ???????????() >> { >> Date date = new Date(0, 1, 1); >> } The name translates to something like "if the year is less than one, it's an error". Interesting. Kind of a weird thing to do; ordinarily you wouldn't want method names that take so long to type. But a unit test method is a special case. The mix of Japanese and English is not as visually jarring as I expected. It actually looks kinda cool. :) -j From guido at python.org Wed May 16 18:49:16 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 16 May 2007 09:49:16 -0700 Subject: [Python-3000] Support for PEP 3131 - discussion on python zope users group In-Reply-To: References: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> <43aa6ff70705160855s8d2edb8k9212455f0696c6f8@mail.gmail.com> Message-ID: On 5/16/07, Jason Orendorff wrote: > On 5/16/07, Collin Winter wrote: > > > -> unittest - very useful to give a better overview of the result of > > > unit test. People pointed us at a Visual C# MVP tutorial > > > http://www.atmarkit.co.jp/fdotnet/nagile/nagile02/nagile02_03.html > > > > I don't know what "a better overview of the result of unit test" > > means. Also, the linked page is in Japanese. > > The page illustrates how a unit test can serve as an > executable specification. The third box of code is a > TestFixture class with methods like this one: > > >> [Test][ExpectedException(typeof(ArgumentException))] > >> public void ???????????() > >> { > >> Date date = new Date(0, 1, 1); > >> } > > The name translates to something like "if the year is > less than one, it's an error". Interesting. Kind of a weird > thing to do; ordinarily you wouldn't want method names > that take so long to type. But a unit test method is a > special case. > > The mix of Japanese and English is not as visually > jarring as I expected. It actually looks kinda cool. :) Thanks for the translation! This meshes nicely with a pattern I've only recently learned in unit testing -- using long descriptive names for tests so the name of the test indicates the tested behavior (as opposed to, say, the name of the class or method being tested). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From hfoffani at gmail.com Wed May 16 18:51:51 2007 From: hfoffani at gmail.com (Hernan M Foffani) Date: Wed, 16 May 2007 18:51:51 +0200 Subject: [Python-3000] Support for PEP 3131 - discussion on python zope users group In-Reply-To: <43aa6ff70705160855s8d2edb8k9212455f0696c6f8@mail.gmail.com> References: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> <43aa6ff70705160855s8d2edb8k9212455f0696c6f8@mail.gmail.com> Message-ID: <11fab4bc0705160951v3a0577eahdf00081ebd8a9032@mail.gmail.com> 2007/5/16, Collin Winter : > On 5/15/07, Guillaume Proux wrote: > > Just to let you know that a discussion on japanese python users group > > is going on regarding this issue. > > > > Most people feel like the PEP3131 would be a welcome addition. > > -> some people point out the fact that special characters like the > > greek letters would be great for all kind of maths calculation. > > -> Many people think that this would enable them to make their own DSL > > Oooh, and we could use actual lambdas instead of the lambda keyword. > > So now we've made the jump from "help (some) international users" to > "I want to use unicode characters just for the hell of it". I understand that the acronym DSL is not the right choice in this discussion because it already has a well known meaning. What I do believe is that the proposal will help users to implement their solutions using the same words they already use in their domain. > > -> unittest - very useful to give a better overview of the result of > > unit test. People pointed us at a Visual C# MVP tutorial > > http://www.atmarkit.co.jp/fdotnet/nagile/nagile02/nagile02_03.html > > I don't know what "a better overview of the result of unit test" > means. Also, the linked page is in Japanese. Please, Guillaume, correct me if I'm wrong, but my understanding is that one of the partner is proposing the use of natural language for test names making the correspondence between the specification and the test case as close as possible. Thus, you can use current unittest tools to show, for instance, the state of your project. From steven.bethard at gmail.com Wed May 16 18:55:57 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 16 May 2007 10:55:57 -0600 Subject: [Python-3000] Raw strings containing \u or \U In-Reply-To: References: Message-ID: On 5/16/07, Guido van Rossum wrote: > Walter Doerwald, in private mail, reminded me of a third use case for > raw strings: docstrings containing example code using backslashes. > Here it really seems wrong to interpolate \u and \U. > > So this is swaying me towards changing this behavior: r"\u1234" will > be a string of length 6, and r"\U00012345" one of length 10. +1 for making raw strings truly raw (where backslashes don't escape anything) and teaching the re module about the necessary escapes (\u, \n, \r, etc.). > I'm still on the fence about the trailing backslash; I personally > prefer to write Windows paths using regular strings and doubled > backslashes. +1 for no escaping of quotes in raw strings. Python provides so many different ways to quote a string, the cases in which you can't just switch to another quoting style are vanishingly small. Examples from the stdlib and their translations:: '\'' --> "'" '("|\')' --> '''("|')''' 'Can\'t stat' --> "Can't stat" '(\'[^\']*\'|"[^"]*")?' --> '''('[^']*'|"[^"]*")?''' Note that allowing trailing backslashes could also clean up stuff in modules like ntpath:: path[-1] in "/\\" --> path[-1] in r"/\" firstTwo == '\\\\' --> firstTwo == r'\\' STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Wed May 16 19:05:45 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 16 May 2007 10:05:45 -0700 Subject: [Python-3000] Raw strings containing \u or \U In-Reply-To: References: Message-ID: On 5/16/07, Steven Bethard wrote: > On 5/16/07, Guido van Rossum wrote: > > Walter Doerwald, in private mail, reminded me of a third use case for > > raw strings: docstrings containing example code using backslashes. > > Here it really seems wrong to interpolate \u and \U. > > > > So this is swaying me towards changing this behavior: r"\u1234" will > > be a string of length 6, and r"\U00012345" one of length 10. > > +1 for making raw strings truly raw (where backslashes don't escape > anything) and teaching the re module about the necessary escapes (\u, > \n, \r, etc.). It already knows about all of those except \u and \U. Someone care to submit a patch? > > I'm still on the fence about the trailing backslash; I personally > > prefer to write Windows paths using regular strings and doubled > > backslashes. > > +1 for no escaping of quotes in raw strings. Python provides so many > different ways to quote a string, the cases in which you can't just > switch to another quoting style are vanishingly small. Examples from > the stdlib and their translations:: > > '\'' --> "'" > '("|\')' --> '''("|')''' > 'Can\'t stat' --> "Can't stat" > '(\'[^\']*\'|"[^"]*")?' --> '''('[^']*'|"[^"]*")?''' > > Note that allowing trailing backslashes could also clean up stuff in > modules like ntpath:: > > path[-1] in "/\\" --> path[-1] in r"/\" > firstTwo == '\\\\' --> firstTwo == r'\\' Can you also search for how often this feature is *used* (i.e. a raw string that has to be raw for other reasons also contains an escaped quote)? If that's rare or we can agree on easy fixes it would ease my mind about this part of the proposal. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From gproux+py3000 at gmail.com Wed May 16 19:14:03 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Thu, 17 May 2007 02:14:03 +0900 Subject: [Python-3000] Support for PEP 3131 (some links to evidence of usage within communities) In-Reply-To: <43aa6ff70705160926n59aa9c4el9b165b074e172e4c@mail.gmail.com> References: <19dd68ba0705141818w62c942b7g576016fcd3cc0ac1@mail.gmail.com> <43aa6ff70705160904p104962b3nb142a1e08bb68b78@mail.gmail.com> <19dd68ba0705160909y47a1eb4cpcabc9d53a8581b6e@mail.gmail.com> <43aa6ff70705160926n59aa9c4el9b165b074e172e4c@mail.gmail.com> Message-ID: <19dd68ba0705161014x11beea5j679334204e211018@mail.gmail.com> Hi Collin, Sorry did not mean to hurt your feelings. On 5/17/07, Collin Winter wrote: > I speak three languages. It's insulting to allege that my opposition > to this proposal is somehow based in English-language cultural > imperialism or some other politically-correct nonsense. I was just trying to point out the fact that the people who are likely the most to be impacted by this PEP3131 are *exactly* the same that will not write up pages in English or any kind of latin languages. I did not really want to mean anything attacking your knowledge and capabilities. My point is: If you really want to understand the benefit of PEP3131 for Japanese people as seen from their eyes, I believe that you have no choice but to either: learn japanese or try to get by with automatic japanese->english translating tools and that you should not complain about the links being in Japanese: this is exactly the reason why people would love python being able to speak their own language. Regards, Guillaume From benji at benjiyork.com Wed May 16 19:41:11 2007 From: benji at benjiyork.com (Benji York) Date: Wed, 16 May 2007 13:41:11 -0400 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> <4648D626.1030201@benjiyork.com> Message-ID: <464B4237.4090802@benjiyork.com> Guido van Rossum wrote: > On 5/14/07, Benji York wrote: >> Collin Winter wrote: >>> PEP: 3133 >>> Title: Introducing Roles >> Everything included here is included in zope.interface. See in-line >> comments below for the analogs. > > Could you look at PEP 3119 and do a similar analysis? Sure. > I expect that > the main thing missing there is that it (currently) has no way to > claim that a particular *object* has a certain behavior. Is "it" in that sentence the ABC PEP or zope.interface? > PEP 3119 currently also doesn't have a > verification step -- but this could easily be added as an (optional) > part of the registration call. I don't care much for verification. People using zope.interface have found that writing good tests is superior to on-demand verification, and I suspect execution time verification is a non-starter because of the overhead (not to mention its actual desirability, or lack thereof). > BTW I think Collin made a mistake when he claimed that the Doglike > role should throw a tantrum just because the actual bark() > implementation has an optional extra argument Agreed. -- Benji York http://benjiyork.com From guido at python.org Wed May 16 19:50:07 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 16 May 2007 10:50:07 -0700 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: <464B4237.4090802@benjiyork.com> References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> <4648D626.1030201@benjiyork.com> <464B4237.4090802@benjiyork.com> Message-ID: On 5/16/07, Benji York wrote: > Guido van Rossum wrote: > > On 5/14/07, Benji York wrote: > >> Collin Winter wrote: > >>> PEP: 3133 > >>> Title: Introducing Roles > >> Everything included here is included in zope.interface. See in-line > >> comments below for the analogs. > > > > Could you look at PEP 3119 and do a similar analysis? > > Sure. > > > I expect that > > the main thing missing there is that it (currently) has no way to > > claim that a particular *object* has a certain behavior. > > Is "it" in that sentence the ABC PEP or zope.interface? The ABC PEP. > > PEP 3119 currently also doesn't have a > > verification step -- but this could easily be added as an (optional) > > part of the registration call. > > I don't care much for verification. People using zope.interface have > found that writing good tests is superior to on-demand verification, and > I suspect execution time verification is a non-starter because of the > overhead (not to mention its actual desirability, or lack thereof). I don't care much for it either. If zope.interface users don't care for it either, I'm happy to declare it a non-use case. I was just thinking of how to "sell" ABCs as an alternative to current happy users of zop.interfaces. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Wed May 16 20:32:37 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 16 May 2007 12:32:37 -0600 Subject: [Python-3000] Raw strings containing \u or \U In-Reply-To: References: Message-ID: On 5/16/07, Guido van Rossum wrote: > On 5/16/07, Steven Bethard wrote: > > +1 for no escaping of quotes in raw strings. Python provides so many > > different ways to quote a string, the cases in which you can't just > > switch to another quoting style are vanishingly small. Examples from > > the stdlib and their translations:: > > > > '\'' --> "'" > > '("|\')' --> '''("|')''' > > 'Can\'t stat' --> "Can't stat" > > '(\'[^\']*\'|"[^"]*")?' --> '''('[^']*'|"[^"]*")?''' > > > > Note that allowing trailing backslashes could also clean up stuff in > > modules like ntpath:: > > > > path[-1] in "/\\" --> path[-1] in r"/\" > > firstTwo == '\\\\' --> firstTwo == r'\\' > > Can you also search for how often this feature is *used* (i.e. a raw > string that has to be raw for other reasons also contains an escaped > quote)? If that's rare or we can agree on easy fixes it would ease my > mind about this part of the proposal. Well, remembering that when you escape a quote in a raw string, the backslash is left in regardless of the enclosing quote type, e.g.:: r"\"" == r'\"' == r"""\"""" == r'''\"''' == '\\"' the question is then whether there are any situations where you can't just switch the quote type. The only things in the stdlib that I could find[1] where the string quotes and the escaped quote were of the same type were: r"^\s*=\s*\"([^\"\\]*(?:\\.[^\"\\]*)*)\"" r"([\"\\])" r'[^\\\'\"%s ]*' r'#\s*doctest:\s*([^\n\'"]*)$', r'(\'[^\']*\'|"[^"]*"|[-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~@]*))?' r"([^.'\"\\#]\b|^)" r'(\'[^\']*\'|"[^"]*")\s*' r'((\\[\\abfnrtv\'"]|\\[0-9]..|\\x..|\\u....)+)', r'(\'[^\']*\'|"[^"]*"|[][\-a-zA-Z0-9./,:;+*%?!&$\(\)_#=~\'"@]*))?' r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))' r'[\"\']?' r'[ \(\)<>@,;:\\"/\[\]\?=]' r"[&<>\"\x80-\xff]+" I believe every one of these would continue to work if you simply replaced r'...' or r"..." with r'''...''', that is, if you used the triple-quoted version. Even some much nastier ones than what's in the stdlib (e.g. where the string starts and ends with different quote types) seem to work out okay when you switch to the appropriate triple quotes:: r'\'\"' == r'''\'\"''' r'"\'' == r""""\'""" I actually wasn't able to find something I couldn't translate. It would be helpful to have another set of eyes if anyone has the time. [1] I skipped the tests dir because I'm lazy. ;-) STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From bwinton at latte.ca Wed May 16 20:51:54 2007 From: bwinton at latte.ca (Blake Winton) Date: Wed, 16 May 2007 14:51:54 -0400 Subject: [Python-3000] Support for PEP 3131 - discussion on python zope users group In-Reply-To: References: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> <43aa6ff70705160855s8d2edb8k9212455f0696c6f8@mail.gmail.com> Message-ID: <464B52CA.9030001@latte.ca> Jason Orendorff wrote: > On 5/16/07, Collin Winter wrote: >>> [Test][ExpectedException(typeof(ArgumentException))] >>> public void ???????????() >>> { >>> Date date = new Date(0, 1, 1); >>> } > The mix of Japanese and English is not as visually > jarring as I expected. It actually looks kinda cool. :) I agree, but that particular example kind of worried me, since in my browser's font, ? looks a lot like ( followed by some other Japanese character. I spent a couple of minutes looking for the closing paren before realizing that it wasn't what I thought it was... Or course, I have the same problem in English, with "rn" looking a lot like "m" sometirnes. (In a related story, a friend of mine mentioned she was on the Pom-pom squad in high-school.) Later, Blake. From guido at python.org Wed May 16 21:14:36 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 16 May 2007 12:14:36 -0700 Subject: [Python-3000] Support for PEP 3131 - discussion on python zope users group In-Reply-To: <464B52CA.9030001@latte.ca> References: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> <43aa6ff70705160855s8d2edb8k9212455f0696c6f8@mail.gmail.com> <464B52CA.9030001@latte.ca> Message-ID: Yeah, I've decided for myself that similar-looking characters are a non-issue. They are a real problem in domain names because spammers use them to fool users into believing they're going to the real ebay. But source code just doesn't have that attack model. There are lots of characters that look the same already -- 1/l/I, o/O/0, in some fonts {/( and )/}. We deal with them. --Guido On 5/16/07, Blake Winton wrote: > Jason Orendorff wrote: > > On 5/16/07, Collin Winter wrote: > >>> [Test][ExpectedException(typeof(ArgumentException))] > >>> public void ???????????() > >>> { > >>> Date date = new Date(0, 1, 1); > >>> } > > The mix of Japanese and English is not as visually > > jarring as I expected. It actually looks kinda cool. :) > > I agree, but that particular example kind of worried me, since in my > browser's font, ? looks a lot like ( followed by some other Japanese > character. I spent a couple of minutes looking for the closing paren > before realizing that it wasn't what I thought it was... Or course, I > have the same problem in English, with "rn" looking a lot like "m" > sometirnes. (In a related story, a friend of mine mentioned she was on > the Pom-pom squad in high-school.) > > Later, > Blake. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Wed May 16 22:01:01 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 16 May 2007 15:01:01 -0500 Subject: [Python-3000] Raw strings containing \u or \U In-Reply-To: References: Message-ID: <464B62FD.4070400@ronadam.com> Steven Bethard wrote: > I actually wasn't able to find something I couldn't translate. It > would be helpful to have another set of eyes if anyone has the time. I have a patch against (*) 2.6 tokanize.py that ignores '\' characters in raw strings. This has two effects. A matching quote, """, ''', ", ', of the type that started the string closes the string even if it is preceded by a back slash, and a back slash can end a raw string. No changes to regular string behavior was made. I'll try to make a patch against the python 3000 branch and uploaded so it can be used for testing. (Unless of course someone else has already did it.) Ron * I didn't have the python 3000 branch on my computer at the time. From guido at python.org Wed May 16 22:29:04 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 16 May 2007 13:29:04 -0700 Subject: [Python-3000] Raw strings containing \u or \U In-Reply-To: <464B62FD.4070400@ronadam.com> References: <464B62FD.4070400@ronadam.com> Message-ID: That would be great! This will automatically turn \u1234 into 6 characters, right? Perhaps you could make the patch against the py3k-struni branch instead of against the regular p3yk (sic) branch? On 5/16/07, Ron Adam wrote: > Steven Bethard wrote: > > > I actually wasn't able to find something I couldn't translate. It > > would be helpful to have another set of eyes if anyone has the time. > > I have a patch against (*) 2.6 tokanize.py that ignores '\' characters in > raw strings. This has two effects. A matching quote, """, ''', ", ', of > the type that started the string closes the string even if it is preceded > by a back slash, and a back slash can end a raw string. No changes to > regular string behavior was made. > > I'll try to make a patch against the python 3000 branch and uploaded so it > can be used for testing. (Unless of course someone else has already did it.) > > Ron > > > * I didn't have the python 3000 branch on my computer at the time. > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Wed May 16 23:05:57 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 16 May 2007 16:05:57 -0500 Subject: [Python-3000] Raw strings containing \u or \U In-Reply-To: References: <464B62FD.4070400@ronadam.com> Message-ID: <464B7235.20500@ronadam.com> Guido van Rossum wrote: > That would be great! This will automatically turn \u1234 into 6 > characters, right? I'm not exactly clear when the '\uxxxx' characters get converted. There isn't any conversion done in tokanize.c that I can see. It's primarily only concerned with finding the beginning and ending of the string at that point. It looks like everything between the beginning and end is just passed along "as is" and it's translated further later in the chain. (I had said earlier tokanize.py, meant tokanize.c) > Perhaps you could make the patch against the py3k-struni branch > instead of against the regular p3yk (sic) branch? I can do that. :-) > On 5/16/07, Ron Adam wrote: >> Steven Bethard wrote: >> >> > I actually wasn't able to find something I couldn't translate. It >> > would be helpful to have another set of eyes if anyone has the time. >> >> I have a patch against (*) 2.6 tokanize.py that ignores '\' characters in >> raw strings. This has two effects. A matching quote, """, ''', ", ', of >> the type that started the string closes the string even if it is preceded >> by a back slash, and a back slash can end a raw string. No changes to >> regular string behavior was made. >> >> I'll try to make a patch against the python 3000 branch and uploaded >> so it >> can be used for testing. (Unless of course someone else has already >> did it.) >> >> Ron >> >> >> * I didn't have the python 3000 branch on my computer at the time. >> _______________________________________________ >> Python-3000 mailing list >> Python-3000 at python.org >> http://mail.python.org/mailman/listinfo/python-3000 >> Unsubscribe: >> http://mail.python.org/mailman/options/python-3000/guido%40python.org >> > > From guido at python.org Wed May 16 23:10:17 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 16 May 2007 14:10:17 -0700 Subject: [Python-3000] Raw strings containing \u or \U In-Reply-To: <464B7235.20500@ronadam.com> References: <464B62FD.4070400@ronadam.com> <464B7235.20500@ronadam.com> Message-ID: On 5/16/07, Ron Adam wrote: > Guido van Rossum wrote: > > That would be great! This will automatically turn \u1234 into 6 > > characters, right? > > I'm not exactly clear when the '\uxxxx' characters get converted. There > isn't any conversion done in tokanize.c that I can see. It's primarily > only concerned with finding the beginning and ending of the string at that > point. It looks like everything between the beginning and end is just > passed along "as is" and it's translated further later in the chain. OK, I think that happens in a totally different place. But it also needs to be fixed. :-) > (I had said earlier tokanize.py, meant tokanize.c) Well, actually, tokenize.py also needs adjustments to support this... > > Perhaps you could make the patch against the py3k-struni branch > > instead of against the regular p3yk (sic) branch? > > I can do that. :-) Great! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rrr at ronadam.com Thu May 17 00:42:24 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 16 May 2007 17:42:24 -0500 Subject: [Python-3000] Raw strings containing \u or \U In-Reply-To: References: <464B62FD.4070400@ronadam.com> Message-ID: <464B88D0.6080309@ronadam.com> Guido van Rossum wrote: > That would be great! This will automatically turn \u1234 into 6 > characters, right? > > Perhaps you could make the patch against the py3k-struni branch > instead of against the regular p3yk (sic) branch? Done. Patch number 1720390 https://sourceforge.net/tracker/index.php?func=detail&aid=1720390&group_id=5470&atid=305470 This doesn't include the strings needing changes in the library to passes all the tests. That's mostly changing single quotes to triple quotes when a string contains both quote characters. I'll make a second patch that includes those. Cheers, Ron From rasky at develer.com Thu May 17 01:29:25 2007 From: rasky at develer.com (Giovanni Bajo) Date: Thu, 17 May 2007 01:29:25 +0200 Subject: [Python-3000] PEP 3124 - more commentary In-Reply-To: <4649976A.1030301@canterbury.ac.nz> References: <20070514192423.624D63A4036@sparrow.telecommunity.com> <20070514214915.C361C3A4036@sparrow.telecommunity.com> <20070514232017.BA6A43A4036@sparrow.telecommunity.com> <20070515003354.194B83A4036@sparrow.telecommunity.com> <20070515014338.8D8EA3A4036@sparrow.telecommunity.com> <4649976A.1030301@canterbury.ac.nz> Message-ID: On 15/05/2007 13.20, Greg Ewing wrote: >> C++ and Java don't have tuples, do they? > > No, but in C++ you could probably do something clever by > overloading the comma operator if you were feeling perverse > enough... Well, there's also tr1::tuple :) -- Giovanni Bajo From jimjjewett at gmail.com Thu May 17 02:19:21 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 16 May 2007 20:19:21 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> Message-ID: On 5/13/07, Jason Orendorff wrote: > I think the gesture alone is worth it, even if no one ever used the > feature productively. But people will. The cost to python-dev is low, > and the cost to English-speaking users is very likely zero. > What am I missing? Additional costs: (1) Security concerns. Offhand, I'm not sure how to exploit it, but I could imagine scenarios, such as if var References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> Message-ID: As I mentioned before, I don't expect either of these will be much of a concern. I guess tools like pylint could optionally warn if non-ascii characters are used. On 5/16/07, Jim Jewett wrote: > On 5/13/07, Jason Orendorff wrote: > > I think the gesture alone is worth it, even if no one ever used the > > feature productively. But people will. The cost to python-dev is low, > > and the cost to English-speaking users is very likely zero. > > > What am I missing? > > Additional costs: > > (1) Security concerns. > > Offhand, I'm not sure how to exploit it, but I could imagine scenarios, such as > > if var > where "var that looked like "<") rather than a comparison. > > (2) Obscure bugs. > > I have seen code that did the wrong thing because a method override > (or global variable name) was misspelled. You can argue that it was > sloppy code, but that sort of thing would be more common when the > programmer couldn't tell the difference visually. (Just as today's > typos are more likely to involve "0" and "O" than "T" and "5") > > Guillaume has pointed out that people whose native language isn't > written in Latin characters already have this problem, but it is a > problem they already learn to deal with as part of learning to > program. > > -jJ > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Thu May 17 02:26:27 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 16 May 2007 20:26:27 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705131104r85531f3o12b7e1769d7b7140@mail.gmail.com> References: <1d85506f0705130833v1058b022re0597cf9f259320d@mail.gmail.com> <19dd68ba0705130925j1dd55f1boba9e1b6c036d0422@mail.gmail.com> <43aa6ff70705131009s7d5b177dmea7c790d670ac3c0@mail.gmail.com> <1d85506f0705131042q23270a91qa31ff2f3940019ed@mail.gmail.com> <19dd68ba0705131104r85531f3o12b7e1769d7b7140@mail.gmail.com> Message-ID: On 5/13/07, Guillaume Proux wrote: > HI Tomer, > > if ??????.?????: > > pass > > which comes first? does it say bacon.eggs or eggs.bacon? > > and what happens if the editor uses a dot prefixed by LTR > > marker? the meaning is reversed, but it still looks the same! > All that is really a *presentation* issue. And as such, an editor > specialized in editing hebrew or arabic python should help you write > the code you want to write. How should I interpret: if ??????.spam: Even if we restricted identifiers to a single script, the combinations of identifiers would still have this issue. > Additionally,would a professional programmer choose to add LTR markers > to make the source code ambiguous? Maybe they're trying to inject a security breach? Unicode identifiers do make auditing by inspection harder. > > you can always translate or transliterate a word to english, like so: > > if beykon.beytzim: > Is this a bijective translation ? How good is most people latin > character reading ability among Hebrew speakers? From the beginning, I > can tell from experience that Japanese people have great difficulties > in reading english or even transliterated japanese (which is never > good anyway because of homonyms) It could be turned into one, using a custom "encoding" codec. -jJ From jimjjewett at gmail.com Thu May 17 02:31:41 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 16 May 2007 20:31:41 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <43aa6ff70705130822q32e3971bradf80fd90ac36578@mail.gmail.com> <46476291.6040502@jmunch.dk> Message-ID: On 5/13/07, Arvind Singh wrote: > On 5/14/07, Anders J. Munch <2007 at jmunch.dk > wrote: > This PEP talks about support for *identifiers*. If you need *extensive* > vocabulary for your *identifiers*, I'd assume that you're coding something > non-trivial (with ignorable exceptions). Such non-trivial code should be > sharable under a _common_ language that *others* can understand as well, > IMHO. But that common language might well be Japanese, particularly if you are writing for a specific customer which happens to be a Japanese company. > Further, if you are doing something non-trivial, I can also assume that > you'd be using third-party libraries. How would the code look if identifiers > were written in various encodings? The core of CPython prefixes its identifiers with Py_ to distinguish them from other libraries. I suspect that a Chinese character vs a Latin character would be almost as distinctive as whether or not the identifer starts with "Py_" -jJ From jyasskin at gmail.com Thu May 17 02:31:50 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Thu, 17 May 2007 02:31:50 +0200 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy for Numbers Message-ID: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> I've updated PEP3141 to remove the algebraic classes and bring the numeric hierarchy much closer to scheme's design. Let me know what you think. Feel free to send typographical and formatting problems just to me. My schedule's a little shaky the next couple weeks, but I'll make updates as quickly as I can. PEP: 3141 Title: A Type Hierarchy for Numbers Version: $Revision: 54928 $ Last-Modified: $Date: 2007-04-23 16:37:29 -0700 (Mon, 23 Apr 2007) $ Author: Jeffrey Yasskin Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 23-Apr-2007 Post-History: Not yet posted Abstract ======== This proposal defines a hierarchy of Abstract Base Classes (ABCs) (PEP 3119) to represent number-like classes. It proposes a hierarchy of ``Number :> Complex :> Real :> Rational :> Integer`` where ``A :> B`` means "A is a supertype of B", and a pair of ``Exact``/``Inexact`` classes to capture the difference between ``floats`` and ``ints``. These types are significantly inspired by Scheme's numeric tower [#schemetower]_. Rationale ========= Functions that take numbers as arguments should be able to determine the properties of those numbers, and if and when overloading based on types is added to the language, should be overloadable based on the types of the arguments. For example, slicing requires its arguments to be ``Integers``, and the functions in the ``math`` module require their arguments to be ``Real``. Specification ============= This PEP specifies a set of Abstract Base Classes with default implementations. If the reader prefers to think in terms of Roles (PEP 3133), the default implementations for (for example) the Real ABC would be moved to a RealDefault class, with Real keeping just the method declarations. Although this PEP uses terminology from PEP 3119, the hierarchy is intended to be meaningful for any systematic method of defining sets of classes, including Interfaces. I'm also using the extra notation from PEP 3107 (Function Annotations) to specify some types. Exact vs. Inexact Classes ------------------------- Floating point values may not exactly obey several of the properties you would expect. For example, it is possible for ``(X + -X) + 3 == 3``, but ``X + (-X + 3) == 0``. On the range of values that most functions deal with this isn't a problem, but it is something to be aware of. Therefore, I define ``Exact`` and ``Inexact`` ABCs to mark whether types have this problem. Every instance of ``Integer`` and ``Rational`` should be Exact, but ``Reals`` and ``Complexes`` may or may not be. (Do we really only need one of these, and the other is defined as ``not`` the first?):: class Exact(metaclass=MetaABC): pass class Inexact(metaclass=MetaABC): pass Numeric Classes --------------- We begin with a Number class to make it easy for people to be fuzzy about what kind of number they expect. This class only helps with overloading; it doesn't provide any operations. **Open question:** Should it specify ``__add__``, ``__sub__``, ``__neg__``, ``__mul__``, and ``__abs__`` like Haskell's ``Num`` class?:: class Number(metaclass=MetaABC): pass Some types (primarily ``float``) define "Not a Number" (NaN) values that return false for any comparison, including equality with themselves, and are maintained through operations. Because this doesn't work well with the Reals (which are otherwise totally ordered by ``<``), Guido suggested we might put NaN in its own type. It is conceivable that this can still be represented by C doubles but be included in a different ABC at runtime. **Open issue:** Is this a good idea?:: class NotANumber(Number): """Implement IEEE 754 semantics.""" def __lt__(self, other): return false def __eq__(self, other): return false ... def __add__(self, other): return self def __radd__(self, other): return self ... Complex numbers are immutable and hashable. Implementors should be careful that they make equal numbers equal and hash them to the same values. This may be subtle if there are two different extensions of the real numbers:: class Complex(Hashable, Number): """A ``Complex`` should define the operations that work on the Python ``complex`` type. If it is given heterogenous arguments, it may fall back on this class's definition of the operations. These operators should never return a TypeError as long as both arguments are instances of Complex (or even just implement __complex__). """ @abstractmethod def __complex__(self): """This operation gives the arithmetic operations a fallback. """ return complex(self.real, self.imag) @property def real(self): return complex(self).real @property def imag(self): return complex(self).imag I define the reversed operations here so that they serve as the final fallback for operations involving instances of Complex. **Open issue:** Should Complex's operations check for ``isinstance(other, Complex)``? Duck typing seems to imply that we should just try __complex__ and succeed if it works, but stronger typing might be justified for the operators. TODO: analyze the combinations of normal and reversed operations with real and virtual subclasses of Complex:: def __radd__(self, other): """Should this catch any type errors and return NotImplemented instead?""" return complex(other) + complex(self) def __rsub__(self, other): return complex(other) - complex(self) def __neg__(self): return -complex(self) def __rmul__(self, other): return complex(other) * complex(self) def __rdiv__(self, other): return complex(other) / complex(self) def __abs__(self): return abs(complex(self)) def conjugate(self): return complex(self).conjugate() def __hash__(self): """Two "equal" values of different complex types should hash in the same way.""" return hash(complex(self)) The ``Real`` ABC indicates that the value is on the real line, and supports the operations of the ``float`` builtin. Real numbers are totally ordered. (NaNs were handled above.):: class Real(Complex, metaclass=TotallyOrderedABC): @abstractmethod def __float__(self): """Any Real can be converted to a native float object.""" raise NotImplementedError def __complex__(self): """Which gives us an easy way to define the conversion to complex.""" return complex(float(self)) @property def real(self): return self @property def imag(self): return 0 def __radd__(self, other): if isinstance(other, Real): return float(other) + float(self) else: return super(Real, self).__radd__(other) def __rsub__(self, other): if isinstance(other, Real): return float(other) - float(self) else: return super(Real, self).__rsub__(other) def __neg__(self): return -float(self) def __rmul__(self, other): if isinstance(other, Real): return float(other) * float(self) else: return super(Real, self).__rmul__(other) def __rdiv__(self, other): if isinstance(other, Real): return float(other) / float(self) else: return super(Real, self).__rdiv__(other) def __rdivmod__(self, other): """Implementing divmod() for your type is sufficient to get floordiv and mod too. """ if isinstance(other, Real): return divmod(float(other), float(self)) else: return super(Real, self).__rdivmod__(other) def __rfloordiv__(self, other): return divmod(other, self)[0] def __rmod__(self, other): return divmod(other, self)[1] def __trunc__(self): """Do we want properfraction, floor, ceiling, and round?""" return trunc(float(self)) def __abs__(self): return abs(float(self)) There is no way to define only the reversed comparison operators, so these operations take precedence over any defined in the other type. :( :: def __lt__(self, other): """The comparison operators in Python seem to be more strict about their input types than other functions. I'm guessing here that we want types to be incompatible even if they define a __float__ operation, unless they also declare themselves to be Real numbers. """ if isinstance(other, Real): return float(self) < float(other) else: return NotImplemented def __le__(self, other): if isinstance(other, Real): return float(self) <= float(other) else: return NotImplemented def __eq__(self, other): if isinstance(other, Real): return float(self) == float(other) else: return NotImplemented There is no built-in rational type, but it's straightforward to write, so we provide an ABC for it:: class Rational(Real, Exact): """rational.numerator and rational.denominator should be in lowest terms. """ @abstractmethod @property def numerator(self): raise NotImplementedError @abstractmethod @property def denominator(self): raise NotImplementedError def __float__(self): return self.numerator / self.denominator class Integer(Rational): @abstractmethod def __int__(self): raise NotImplementedError def __float__(self): return float(int(self)) @property def numerator(self): return self @property def denominator(self): return 1 def __ror__(self, other): return int(other) | int(self) def __rxor__(self, other): return int(other) ^ int(self) def __rand__(self, other): return int(other) & int(self) def __rlshift__(self, other): return int(other) << int(self) def __rrshift__(self, other): return int(other) >> int(self) def __invert__(self): return ~int(self) def __radd__(self, other): """All of the Real methods need to be overridden here too in order to get a more exact type for their results. """ if isinstance(other, Integer): return int(other) + int(self) else: return super(Integer, self).__radd__(other) ... def __hash__(self): """Surprisingly, hash() needs to be overridden too, since there are integers that float can't represent.""" return hash(int(self)) Adding More Numeric ABCs ------------------------ There are, of course, more possible ABCs for numbers, and this would be a poor hierarchy if it precluded the possibility of adding those. You can add ``MyFoo`` between ``Complex`` and ``Real`` with:: class MyFoo(Complex): ... MyFoo.register(Real) TODO(jyasskin): Check this. Rejected Alternatives ===================== The initial version of this PEP defined an algebraic hierarchy inspired by a Haskell Numeric Prelude [#numericprelude]_ including MonoidUnderPlus, AdditiveGroup, Ring, and Field, and mentioned several other possible algebraic types before getting to the numbers. I had expected this to be useful to people using vectors and matrices, but the NumPy community really wasn't interested. The numbers then had a much more branching structure to include things like the Gaussian Integers and Z/nZ, which could be Complex but wouldn't necessarily support things like division. The community decided that this was too much complication for Python, so the proposal has been scaled back to resemble the Scheme numeric tower much more closely. References ========== .. [#pep3119] Introducing Abstract Base Classes (http://www.python.org/dev/peps/pep-3119/) .. [#pep3107] Function Annotations (http://www.python.org/dev/peps/pep-3107/) .. [3] Possible Python 3K Class Tree?, wiki page created by Bill Janssen (http://wiki.python.org/moin/AbstractBaseClasses) .. [#numericprelude] NumericPrelude: An experimental alternative hierarchy of numeric type classes (http://darcs.haskell.org/numericprelude/docs/html/index.html) .. [#schemetower] The Scheme numerical tower (http://www.swiss.ai.mit.edu/ftpdir/scheme-reports/r5rs-html/r5rs_8.html#SEC50) Acknowledgements ================ Thanks to Neil Norwitz for encouraging me to write this PEP in the first place, to Travis Oliphant for pointing out that the numpy people didn't really care about the algebraic concepts, to Alan Isaac for reminding me that Scheme had already done this, and to Guido van Rossum and lots of other people on the mailing list for refining the concept. Copyright ========= This document has been placed in the public domain. From greg.ewing at canterbury.ac.nz Thu May 17 02:48:11 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 17 May 2007 12:48:11 +1200 Subject: [Python-3000] Raw strings containing \u or \U In-Reply-To: References: Message-ID: <464BA64B.4020007@canterbury.ac.nz> Guido van Rossum wrote: > I'm still on the fence about the trailing backslash; I personally > prefer to write Windows paths using regular strings and doubled > backslashes. Maybe we should have a special w"..." string in the Windows version of Python for pathnames. It would raise a SyntaxError in non-Windows Pythons, thus discouraging people trying to use Windows pathnames in cross-platform code. :-) -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From jimjjewett at gmail.com Thu May 17 02:50:33 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 16 May 2007 20:50:33 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <1d85506f0705141212m65b9ec37q5f685f507e394f01@mail.gmail.com> References: <1d85506f0705130833v1058b022re0597cf9f259320d@mail.gmail.com> <19dd68ba0705130925j1dd55f1boba9e1b6c036d0422@mail.gmail.com> <43aa6ff70705131009s7d5b177dmea7c790d670ac3c0@mail.gmail.com> <1d85506f0705131042q23270a91qa31ff2f3940019ed@mail.gmail.com> <19dd68ba0705131104r85531f3o12b7e1769d7b7140@mail.gmail.com> <1d85506f0705141212m65b9ec37q5f685f507e394f01@mail.gmail.com> Message-ID: On 5/14/07, tomer filiba wrote: > as an english-second-language programmer, i'd really like to be able > to have unicode identifiers -- but my gut feeling is -- it will open the > door for a tower of babel. I don't think this happened in Lisp. I won't pretend there hasn't been a tower of babel there, but it isn't because you can use non-ascii symbols. You can use any character in a symbol (~= identifier), including (if your implementation supports such characters at all, even in comments) Hebrew or Chinese characters. On the other hand, you have to go out of your way to use unusual identifier characters (including latin characters, if you care about the case); this may have contributed to the strong tendency to stick with ascii. -jJ From greg.ewing at canterbury.ac.nz Thu May 17 02:51:58 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 17 May 2007 12:51:58 +1200 Subject: [Python-3000] Support for PEP 3131 - discussion on python zope users group In-Reply-To: <43aa6ff70705160855s8d2edb8k9212455f0696c6f8@mail.gmail.com> References: <19dd68ba0705152213h7dc04e48qfc2f1ad4f5d61b99@mail.gmail.com> <43aa6ff70705160855s8d2edb8k9212455f0696c6f8@mail.gmail.com> Message-ID: <464BA72E.9010204@canterbury.ac.nz> Collin Winter wrote: > So now we've made the jump from "help (some) international users" to > "I want to use unicode characters just for the hell of it". Seems to me it's more like "I want to express my algorithm in a way that other mathematicians can easily follow". Which isn't all that much different from "I want to express my algorithm in a way that other speakers of my native language can easily follow". -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From tomerfiliba at gmail.com Thu May 17 03:06:13 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Thu, 17 May 2007 03:06:13 +0200 Subject: [Python-3000] pep 3131 again Message-ID: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> === RTL/LTR === i pointed out already that no existing editor can handle LTR-RTL representation correctly, which essentially renders all RTL languages out of the scope of this PEP. that doesn't bother me personally so much, as i'm not going to use this feature anyway, but that still leaves us with the "european imposed colonialism" :) the only practical way to use RTL languages in code is to have an RTL programming language, where "if" is spelled "??", "for" as "????", "in" as "????", and so on, and the entire program is RTL. having code like -- for ??? in ????(1,2,3) is only unreadable by all means (since the parenthesis are LTR, while the name is RTL, etc.) === help people who can't type english === since the keywords remain ASCII, along with stdlib and all other major third party libs -- how does that help the english-illiterate programmer? import random ?? = range(100) random.shuffle(?? ) ? = ??.pop(7) if len(?) > 58: print "?????!!!" # ?? ?? ??????? ???? ??? ????? apart from excessive visual noise, the amount of *latin* identifiers and keywords is not negligible. if all you're trying to save is coming up with english names for your functions, than that's okay, but saying "japanese people have a hard time coding in the latin alphabet" does not withstand practical usage. the solution is an intermediate translator that lies between the programmer and the interpreter. that -- or learning latin (it's only *26* letters :) and transliterating japanese names with latin characters. all in all, i'm still -1 on that. i would rather go halfway -- allow unicode comments. let people write docs in their native language, that's all fine with me (or is that already imposed by the UTF8 PEP?) -tomer From jcarlson at uci.edu Thu May 17 03:30:29 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 16 May 2007 18:30:29 -0700 Subject: [Python-3000] pep 3131 again In-Reply-To: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> References: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> Message-ID: <20070516182431.8591.JCARLSON@uci.edu> "tomer filiba" wrote: > all in all, i'm still -1 on that. i would rather go halfway -- allow unicode > comments. let people write docs in their native language, that's all > fine with me (or is that already imposed by the UTF8 PEP?) I could have sworn that unicode comments and docstrings are already allowed with any sufficient encoding, with or without the UFT8 default encoding PEP. Testing on Python 2.3 seems to confirm this. - Josiah From talin at acm.org Thu May 17 04:30:03 2007 From: talin at acm.org (Talin) Date: Wed, 16 May 2007 19:30:03 -0700 Subject: [Python-3000] PEP 3131 - the details Message-ID: <464BBE2B.1050201@acm.org> While there has been a lot of discussion as to whether to accept PEP 3131 as a whole, there has been little discussion as to the specific details of the PEP. In particular, is it generally agreed that the Unicode character classes listed in the PEP are the ones we want to include in identifiers? My preference is to be conservative in terms of what's allowed. -- Talin From mike.klaas at gmail.com Thu May 17 04:58:39 2007 From: mike.klaas at gmail.com (Mike Klaas) Date: Wed, 16 May 2007 19:58:39 -0700 Subject: [Python-3000] pep 3131 again In-Reply-To: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> References: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> Message-ID: <3BD9B8F2-64EE-4864-A229-8C0D7E86EE96@gmail.com> On 16-May-07, at 6:06 PM, tomer filiba wrote: > > === help people who can't type english === > since the keywords remain ASCII, along with stdlib and all other major > third party libs -- how does that help the english-illiterate > programmer? > > import random > ?? = range(100) > random.shuffle(?? ) > ? = ??.pop(7) > if len(?) > 58: > print "?????!!!" # ?? ?? ??????? > ???? ??? ????? > > apart from excessive visual noise, the amount of *latin* > identifiers and > keywords is not negligible. if all you're trying to save is coming > up with > english names for your functions, than that's okay, but saying > "japanese people have a hard time coding in the latin alphabet" > does not withstand practical usage. It will always be harder for non-english-speaking people to learn an english-derived programming language. It is somewhat specious to equate the difficulty of learning the keywords and (some of the) standard library with the difficulty of using latin completely. Consider that for many language which aren't as pseudocodal as python, there is already a need to learn arbitrary symbols. $,%,@ have special meaning in perl, "car/cdr" in lisp, '!/&&/||/~' in c... these finite sets of symbols are necessary for english-speaking people to learn, and non-english-speaking people would (I imagine) apply similar rules for learning the keywords of python. Imagine if python keywords were in english, but written using the phonetic symbols. It would take a while to get used to the different keywords, as would learning any new symbols. It might even be extremely difficult. However, the difficulty would not be the same as learning to quickly write and understand english words written phonetically (which would be required if the phonetic alphabet were the canonical characters of python symbols). I don't have experience learning to program in a foreign language, but it seems evident to me that the two levels of familiarity are substantially diffierent. -Mike From foom at fuhm.net Thu May 17 05:14:21 2007 From: foom at fuhm.net (James Y Knight) Date: Wed, 16 May 2007 23:14:21 -0400 Subject: [Python-3000] pep 3131 again In-Reply-To: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> References: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> Message-ID: <82E08374-B97F-4884-9D26-2F5A4CCF9392@fuhm.net> On May 16, 2007, at 9:06 PM, tomer filiba wrote: > === RTL/LTR === > i pointed out already that no existing editor can handle LTR-RTL > representation correctly, which essentially renders all RTL languages > out of the scope of this PEP. that doesn't bother me personally so > much, > as i'm not going to use this feature anyway, but that still leaves > us with > the "european imposed colonialism" :) > > the only practical way to use RTL languages in code is to have an RTL > programming language, where "if" is spelled "??", "for" as > "????", > "in" as "????", and so on, and the entire program is RTL. having > code > like -- > for ??? in ????(1,2,3) > is only unreadable by all means (since the parenthesis are LTR, while > the name is RTL, etc.) It is interesting to contrast the rendering of that (ABC being substitutes for hebrew characters): for ABB in 1,2,3)ACAC) with the rendering of: for ??? in ????(a,b,c) as: for ABB in ACAC(a,b,c) This is I suppose due to numbers and punctuation having weak directionality in the bidi algorithm, which isn't really appropriate for tokens in a programming language. So yes, clearly, an editor that takes into account the special needs of programming languages is necessary to effectively write bidi code. But it's certainly not inconceivable, and I don't see that the non-existence of an effective bidi editor should influence the decision to allow unicode characters in python at all. For a majority of languages that are LTR, it is not an issue, and I have every confidence that the bidi programming editor problem will be solved at some point in the future. The only thing python can possibly do to help with this is to ignore any RLO/ LRO/LRE/RLE/PDF/RLM/LRM characters it sees during tokenization. (probably ought to ignore anything with the "Default_Ignorable_Code_Point" unicode property). This would allow a smart editor to save the text with such formatting characters in it, so that other "dumb" viewers would not be confused. For example, with explicit formatting added, rendering can be made correct: for ????? in ???????(1,2,3) http://imagic.weizmann.ac.il/~dov/Hebrew/logicUI24.htm#h1-25 shows someone has thought about this at least a little from the editor perspective... James From tjreedy at udel.edu Thu May 17 05:38:53 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 16 May 2007 23:38:53 -0400 Subject: [Python-3000] PEP 3131 - the details References: <464BBE2B.1050201@acm.org> Message-ID: "Talin" wrote in message news:464BBE2B.1050201 at acm.org... | While there has been a lot of discussion as to whether to accept PEP | 3131 as a whole, there has been little discussion as to the specific | details of the PEP. In particular, is it generally agreed that the | Unicode character classes listed in the PEP are the ones we want to | include in identifiers? My preference is to be conservative in terms of | what's allowed. Some questions I have: is the defined UID set the same as in the referenced appendix? Is it the same as in Java (and hence Jython)? The same as in .NET (and hence IronPython)? tjr From greg.ewing at canterbury.ac.nz Thu May 17 05:53:26 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 17 May 2007 15:53:26 +1200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <1d85506f0705130833v1058b022re0597cf9f259320d@mail.gmail.com> <19dd68ba0705130925j1dd55f1boba9e1b6c036d0422@mail.gmail.com> <43aa6ff70705131009s7d5b177dmea7c790d670ac3c0@mail.gmail.com> <1d85506f0705131042q23270a91qa31ff2f3940019ed@mail.gmail.com> <19dd68ba0705131104r85531f3o12b7e1769d7b7140@mail.gmail.com> <1d85506f0705141212m65b9ec37q5f685f507e394f01@mail.gmail.com> Message-ID: <464BD1B6.9020202@canterbury.ac.nz> Jim Jewett wrote: > You can use any character in a symbol (~= identifier), including (if > your implementation supports such characters at all, even in comments) > Hebrew or Chinese characters. Lisp is a bit different, because it's always had only a very few chars that aren't identifier chars, so you're used to seeing identifiers with all sorts of junk in them. But in Python, you tend to see anything that you don't recognise as a letter or digit as "punctuation" and therefore non-identifier. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From g.brandl at gmx.net Thu May 17 07:45:17 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 17 May 2007 07:45:17 +0200 Subject: [Python-3000] Raw strings containing \u or \U In-Reply-To: <464B7235.20500@ronadam.com> References: <464B62FD.4070400@ronadam.com> <464B7235.20500@ronadam.com> Message-ID: Ron Adam schrieb: > Guido van Rossum wrote: >> That would be great! This will automatically turn \u1234 into 6 >> characters, right? > > I'm not exactly clear when the '\uxxxx' characters get converted. There > isn't any conversion done in tokanize.c that I can see. It's primarily > only concerned with finding the beginning and ending of the string at that > point. It looks like everything between the beginning and end is just > passed along "as is" and it's translated further later in the chain. Look at Python/ast.c, which has functions parsestr() and decode_unicode(). The latter calls PyUnicode_DecodeRawUnicodeEscape() which I think is the function you're looking for. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From foom at fuhm.net Thu May 17 07:50:17 2007 From: foom at fuhm.net (James Y Knight) Date: Thu, 17 May 2007 01:50:17 -0400 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <464BBE2B.1050201@acm.org> References: <464BBE2B.1050201@acm.org> Message-ID: <69B09BFE-3BF3-4532-98EA-8A7E44461D77@fuhm.net> On May 16, 2007, at 10:30 PM, Talin wrote: > While there has been a lot of discussion as to whether to accept PEP > 3131 as a whole, there has been little discussion as to the specific > details of the PEP. In particular, is it generally agreed that the > Unicode character classes listed in the PEP are the ones we want to > include in identifiers? One issue I see is that the PEP defines ID_Start and ID_Continue itself. It should not do that, bue instead reference as authoritative the unicode properties ID_Start and ID_Continue defined in the unicode property database. ID_Start is officially: Lu+Ll+Lt+Lm+Lo+Nl+Other_ID_Start and ID_Continue is officially: ID_Start + Mn+Mc+Nd+Pc + Other_ID_Continue The only differences between PEP 3131's definition and the official ones is the Other_* bits. Those are there to ensure the requirement that anything now in ID_Start/ID_Continue will always in the future be in said categories. That is an important feature, and should not be overlooked. Without the supplemental list, a future version of unicode which changes the general class of a character could make a previously valid identifier become invalid. The list currently includes the following entries: 2118 ; Other_ID_Start # So SCRIPT CAPITAL P 212E ; Other_ID_Start # So ESTIMATED SYMBOL 309B..309C ; Other_ID_Start # Sk [2] KATAKANA-HIRAGANA VOICED SOUND MARK..KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK 1369..1371 ; Other_ID_Continue # No [9] ETHIOPIC DIGIT ONE..ETHIOPIC DIGIT NINE This list is available as part of the PropList.txt file in the unicode data, which ought to be included automatically in python's unicode database so as to get future changes. > My preference is to be conservative in terms of what's allowed. I do not believe it is a good idea for python to define its own identifier rules. The rules defined in UAX31 make sense and should be used directly, with only the minor amendment of _ as an allowable start character. James From talin at acm.org Thu May 17 09:40:19 2007 From: talin at acm.org (Talin) Date: Thu, 17 May 2007 00:40:19 -0700 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy for Numbers In-Reply-To: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> Message-ID: <464C06E3.2090104@acm.org> Jeffrey Yasskin wrote: > I've updated PEP3141 to remove the algebraic classes and bring the > numeric hierarchy much closer to scheme's design. Let me know what you > think. Feel free to send typographical and formatting problems just to > me. My schedule's a little shaky the next couple weeks, but I'll make > updates as quickly as I can. General comments: I need to give some background first, so be patient :) The original version of this PEP was written at a time when ABCs were at an earlier stage in their conceptual development. The notion of overriding 'isinstance' had not yet been introduced, and so the only way to inherit from an ABC was by the traditional inheritance mechanism. At that time, there were a number of proposals for adding ABC base classes to Python's built-in types. Those ABCs, being the foundation for the built-in types, would have had to be built-ins themselves, and would have been required to be initialized prior to the built-ins that depended on them. This in turn meant that those ABCs were "special", in the sense that they were officially sanctioned by the Python runtime itself. The ABCs in this PEP and in the other ABC PEPs would have been given a privileged status, elevated even above the classes in the standard library. My feeling at the time was that I was uncomfortable with a brand new type hierarchy, still in a relatively immature stage of development, being deeply rooted into the core of Python. Being embedded into the interpreter means that it would be hard to experiment with different variations and to test out different options for the number hierarchy. Not only would the embedded classes be hard to change, but there would be no way that alternative proposals could compete. My concern was that there would be little, if any, evolution of these concepts, and that we would be stuck with a set of decisions which had been made in haste. This is exactly contrary to the usual prescription for Python library modules, which are supposed to prove themselves in real-world apps before being enshrined in the standard library. Now, the situation has changed somewhat. The ABC PEP has radically shifted its focus, de-emphasizing traditional inheritance towards a new mechanism which I call 'dynamic inheritance' - the ability to declare new inheritance relations after a class has been created. Lets therefore assume that the numeric ABCs will use this new inheritance mechanism, avoiding the problem of taking an immature class hierarchy and setting it in stone. The PEPs in this class would then no longer need to have this privileged status; They could be replaced and changed at will. Assuming that this is true, the question then becomes whether these classes should be treated like any other standard library submission. In other words, shouldn't this PEP be implemented as a separate module, and have to prove itself 'in the wild' before being adopted into the stdlib? Does this PEP even need to be a PEP at all, or can it just be a 3rd-party library that is eventually adopted into Python? Now, I *could* see adopting an untried library embodying untested ideas into the stdlib if there was a crying need for the features of such a library, and those needs were clearly being unfulfilled. However, I am not certain that this is the case here. At the very least, I think it should be stated in the PEP whether or not the ABCs defined here are going to be using traditional or dynamic inheritance. If it is the latter, and we decide that this PEP is going to be part of the stdlib, then I propose the following library organization: import abc # Imports the basic ABC mechanics import abc.collections # MutableSequence and such import abc.math # The number hierarchy ... and so on Now, there is another issue that needs to be dicussed. The classes in the PEP appear to be written with lots of mixin methods, such as __rsub__ and __abs__ and such. Unfortunately, the current proposed method for dynamic inheritance does not allow for methods or properties to be inherited from the 'virtual' base class. Which means that all of the various methods defined in this PEP are utterly meaningless other than as documentation - except in the case of a new user-created class of numbers which inherit from these ABCs using traditional inheritance, which is not something that I expect to happen very often at all. For virtually all practical uses, the elaborate methods defined in this PEP will be unused and inaccessible. This really highlights what I think is a problem with dynamic inheritance, and I think that this inconsistency between traditional and dynamic inheritance will eventually come back to haunt us. It has always been the case in the past that for every property of class B, if isinstance(A, B) == True, then A also has that property, either inherited from B, or overridden in A. The fact that this invariant will no longer hold true is a problem in my opinion. I realize that there isn't currently a solution to efficiently allow inheritance of properties via dynamic inheritance. As a software engineer, however, I generally feel that if a feature is unreliable, then it shouldn't be used at all. So if I were designing a class hierarchy of ABCs, I would probably make a rule for myself not to define any properties or methods in the ABCs at all, and to *only* use ABCs for type testing via 'isinstance'. In other words, if I were writing this PEP, all of those special methods would be omitted, simply because as a writer of a subclass I couldn't rely on being able to use them. The only alternative that I can see is to not use dynamic inheritance at all, and instead have the number classes inherit from these ABCs using the traditional mechanism. But that brings up all the problems of immaturity and requiring them to be built-in that I brought up earlier. -- Talin From martin at v.loewis.de Thu May 17 10:51:19 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 May 2007 10:51:19 +0200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: References: <464BBE2B.1050201@acm.org> Message-ID: <464C1787.7090209@v.loewis.de> > Some questions I have: is the defined UID set the same as in the referenced > appendix? Yes; it was copied from there. > Is it the same as in Java (and hence Jython)? No. Not sure whether I can produce a complete list of differences, but some of them are: - Java allows $ in identifiers, the PEP doesn't (as is Python tradition) (more generally: it allows currency symbols in identifiers) - Java allows arbitrary connecting punctuators as the start; the PEP only allows the underscore - Java allows "arbitrary" digits in an identifier. I'm not quite sure what that means: JLS refers to isJavaIdentifierPart, which specifies "a digit" and refers to isLetterOrDigit, which refers to JLS. isDigit gives true if the character NAME contains DIGIT, and the digit is not in the range U+2000..U+2FFF The PEP specifies that digits need to have the Nd class. Comparing these two, it seems that Java allows several characters from the No class, which Python does not allow. - Java allows "ignorable control characters" in identifiers, which Python doesn't allow. So, in short, it seems that Python's identifier syntax would be strictly more restrictive than Java's. > The same as in .NET (and hence IronPython)? This kind of research is time consuming; it cost me an hour to come up with above list. Please research it for yourself. Regards, Martin From martin at v.loewis.de Thu May 17 11:10:58 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 May 2007 11:10:58 +0200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <69B09BFE-3BF3-4532-98EA-8A7E44461D77@fuhm.net> References: <464BBE2B.1050201@acm.org> <69B09BFE-3BF3-4532-98EA-8A7E44461D77@fuhm.net> Message-ID: <464C1C22.7030806@v.loewis.de> > One issue I see is that the PEP defines ID_Start and ID_Continue > itself. It should not do that, bue instead reference as authoritative > the unicode properties ID_Start and ID_Continue defined in the > unicode property database. ID_Start and ID_Continue are derived non-mandatory properties, and I believe UAX#31 is the one defining these properties. So I thought I could just copy the definition. Currently, the Python unicodedata module does not contain a definition for ID_Start and ID_Continue, so I could not use it in the PEP. > ID_Start is officially: Lu+Ll+Lt+Lm+Lo+Nl+Other_ID_Start > and ID_Continue is officially: ID_Start + Mn+Mc+Nd+Pc + > Other_ID_Continue I know see what 'stability extensions' are which are mentioned in the PEP (copied from UAX#31). Even though Python currently does not include Other_ID_Start and Other_ID_Continue, it could be made so in the parser. It would have been nice if UAX#31 had mentioned that the "stability extensions" are recorded in these properties. > The only differences between PEP 3131's definition and the official > ones is the Other_* bits. Those are there to ensure the requirement > that anything now in ID_Start/ID_Continue will always in the future > be in said categories. That is an important feature, and should not > be overlooked. See the PEP: there was an XXX remark I still needed to resolve. > This list is available as part of the PropList.txt file in the > unicode data, which ought to be included automatically in python's > unicode database so as to get future changes. This I'm not so sure about. I changed the PEP to say that Other_ID_{Start|Continue} should be included. Whether the other properties should be added to the unidata module, I don't know - I would like to see use cases first before including them. > I do not believe it is a good idea for python to define its own > identifier rules. The rules defined in UAX31 make sense and should be > used directly, with only the minor amendment of _ as an allowable > start character. That was my plan indeed. Regards, Martin From martin at v.loewis.de Thu May 17 11:13:48 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 17 May 2007 11:13:48 +0200 Subject: [Python-3000] pep 3131 again In-Reply-To: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> References: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> Message-ID: <464C1CCC.3070008@v.loewis.de> > all in all, i'm still -1 on that. i would rather go halfway -- allow unicode > comments. let people write docs in their native language, that's all > fine with me (or is that already imposed by the UTF8 PEP?) As other's have pointed out: you can use non-ASCII comments for a long time. In fact, you could use non-ASCII comments in *all* versions of Python, and people have been doing so for years. Regards, Martin From martin at v.loewis.de Thu May 17 11:23:00 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 17 May 2007 11:23:00 +0200 Subject: [Python-3000] pep 3131 again In-Reply-To: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> References: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> Message-ID: <464C1EF4.6040603@v.loewis.de> > === help people who can't type english === > since the keywords remain ASCII, along with stdlib and all other major > third party libs -- how does that help the english-illiterate programmer? english-illiterate and "can't type english" are very different things. By "can't type english", I assume you mean "can't type Latin characters". These users are not helped at all by this PEP, but I think they are really rare, since keyboards commonly support a mode to enter Latin characters (perhaps after pressing some modifier key, or switching to Latin mode). > > import random > ?? = range(100) > random.shuffle(?? ) > ? = ??.pop(7) > if len(?) > 58: > print "?????!!!" # ?? ?? ??????? ???? ??? ????? > > apart from excessive visual noise, the amount of *latin* identifiers and > keywords is not negligible. Right. However, you don't have to understand *English* to write or read this text. You don't need to know that "import" means "to bring from a foreign or external source", and that "shuffle" means "to mix in a mass confusedly". Instead, understanding them by their Python meaning is enough. > if all you're trying to save is coming up with > english names for your functions, than that's okay, but saying > "japanese people have a hard time coding in the latin alphabet" > does not withstand practical usage. Coming up with English names is not necessary today. Coming up with Latin spellings is. Whether or not Japanese or Chinese people with no knowledge of English still can master the Latin alphabet easily, I don't know, as all Chinese people I do know speak German or English well. I would say "they can speak for themselves", except that then neither of us would understand them. Regards, Martin From hfoffani at gmail.com Thu May 17 11:27:51 2007 From: hfoffani at gmail.com (Hernan M Foffani) Date: Thu, 17 May 2007 11:27:51 +0200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <464C1787.7090209@v.loewis.de> References: <464BBE2B.1050201@acm.org> <464C1787.7090209@v.loewis.de> Message-ID: <11fab4bc0705170227s1925cee2j2181c45b7772d9d7@mail.gmail.com> > > The same as in .NET (and hence IronPython)? > > This kind of research is time consuming; it cost me an hour to come > up with above list. Please research it for yourself. FYI: ----------------- ECMA-334 C# Language Specification 9 Lexical structure 9.4 Tokens 9.4.2 Identifiers Paragraph 1 (Page 55, Line 11) 1 The rules for identifiers rules given in this section correspond exactly to those recommended by the Unicode Standard Annex 15 except that underscore is allowed as an initial character (as is traditional in the C programming language), Unicode escape sequences are permitted in identifiers, and the "@" character is allowed as a prefix to enable keywords to be used as identifiers. identifier : available-identifier @ identifier-or-keyword available-identifier : An identifier-or-keyword that is not a keyword identifier-or-keyword : identifier-start-character identifier-part-charactersopt identifier-start-character : letter-character _ (the underscore character U+005F) identifier-part-characters : identifier-part-character identifier-part-characters identifier-part-character identifier-part-character : letter-character decimal-digit-character connecting-character combining-character formatting-character letter-character : A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl A unicode-escape-sequence representing a character of classes Lu, Ll, Lt, Lm, Lo, or Nl combining-character : A Unicode character of classes Mn or Mc A unicode-escape-sequence representing a character of classes Mn or Mc decimal-digit-character : A Unicode character of the class Nd A unicode-escape-sequence representing a character of the class Nd connecting-character : A Unicode character of the class Pc A unicode-escape-sequence representing a character of the class Pc formatting-character : A Unicode character of the class Cf A unicode-escape-sequence representing a character of the class Cf ------- Disclaimer: don't know the specification date nor its authenticity. From hernan at foffani.org Thu May 17 11:35:19 2007 From: hernan at foffani.org (=?ISO-8859-1?Q?Hernan_Mart=EDnez-Foffani?=) Date: Thu, 17 May 2007 11:35:19 +0200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <11fab4bc0705170227s1925cee2j2181c45b7772d9d7@mail.gmail.com> References: <464BBE2B.1050201@acm.org> <464C1787.7090209@v.loewis.de> <11fab4bc0705170227s1925cee2j2181c45b7772d9d7@mail.gmail.com> Message-ID: <11fab4bc0705170235m3d9d50fg2aab33eb712f05b0@mail.gmail.com> > > > The same as in .NET (and hence IronPython)? > > > > This kind of research is time consuming; it cost me an hour to come > > up with above list. Please research it for yourself. C# identifiers (cont) About normalization and @ --------- Paragraph 2 (Page 56, Line 5) 1 An identifier in a conforming program must be in the canonical format defined by Unicode Normalization Form C, as defined by Unicode Standard Annex 15. 2 The behavior when encountering an identifier not in Normalization Form C is implementation-defined; however, a diagnostic is not required. Paragraph 3 (Page 56, Line 8) 1 The prefix "@" enables the use of keywords as identifiers, which is useful when interfacing with other programming languages. 2 The character @ is not actually part of the identifier, so the identifier might be seen in other languages as a normal identifier, without the prefix. 3 An identifier with an @ prefix is called a verbatim identifier. [Note: Use of the @ prefix for identifiers that are not keywords is permitted, but strongly discouraged as a matter of style. end note] [Example: The example: class @class { public static void @static(bool @bool) { if (@bool) System.Console.WriteLine("true"); else System.Console.WriteLine("false"); } } class Class1 { static void M() { cl\u0061ss.st\u0061tic(true); } } defines a class named "class" with a static method named "static" that takes a parameter named "bool". Note that since Unicode escapes are not permitted in keywords, the token "cl\u0061ss" is an identifier, and is the same identifier as "@class". end example] Paragraph 4 (Page 56, Line 32) 1 Two identifiers are considered the same if they are identical after the following transformations are applied, in order: * 2 The prefix "@", if used, is removed. * 3 Each unicode-escape-sequence is transformed into its corresponding Unicode character. * 4 Any formatting-characters are removed. Paragraph 5 (Page 56, Line 37) 1 Identifiers containing two consecutive underscore characters (U+005F) are reserved for use by the implementation; however, no diagnostic is required if such an identifier is defined. [Note: For example, an implementation might provide extended keywords that begin with two underscores. end note] ----------------- Same disclaimer as before applies. Regards, -Hern?n. From ncoghlan at gmail.com Thu May 17 12:56:04 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 May 2007 20:56:04 +1000 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy for Numbers In-Reply-To: <464C06E3.2090104@acm.org> References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> <464C06E3.2090104@acm.org> Message-ID: <464C34C4.2080702@gmail.com> Talin wrote: > This really highlights what I think is a problem with dynamic > inheritance, and I think that this inconsistency between traditional and > dynamic inheritance will eventually come back to haunt us. It has always > been the case in the past that for every property of class B, if > isinstance(A, B) == True, then A also has that property, either > inherited from B, or overridden in A. The fact that this invariant will > no longer hold true is a problem in my opinion. > > I realize that there isn't currently a solution to efficiently allow > inheritance of properties via dynamic inheritance. As a software > engineer, however, I generally feel that if a feature is unreliable, > then it shouldn't be used at all. So if I were designing a class > hierarchy of ABCs, I would probably make a rule for myself not to define > any properties or methods in the ABCs at all, and to *only* use ABCs for > type testing via 'isinstance'. If a class doesn't implement the interface defined by an ABC, you should NOT be registering it with that ABC via dynamic inheritance. *That's* the bug - the program is claiming that "instances of class A can be treated as if they were an instance of B" when that statement is simply not true. And without defining an interface, dispatching on the ABC is pointless - you don't know whether or not you support the operations implied by that ABC because there aren't any defined! Now, with respect to the number hierarchy, I think building it as a vertical stack doesn't really match the way numbers have historically worked in Python - integers and floats, for example, don't implement the complex number API: >>> (1).real Traceback (most recent call last): File "", line 1, in AttributeError: 'int' object has no attribute 'real' >>> (1.0).real Traceback (most recent call last): File "", line 1, in AttributeError: 'float' object has no attribute 'real' Given the migration of PEP 3119 to an approach which is friendlier to classification after the fact, it's probably fine to simply punt on the question of an ABC heirarchy for numbers (as Talin already pointed out). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ncoghlan at gmail.com Thu May 17 13:11:20 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 May 2007 21:11:20 +1000 Subject: [Python-3000] r55359 - python/branches/py3k-struni/Lib/test/test_strop.py In-Reply-To: References: <20070515214221.787161E4012@bag.python.org> <17995.29840.132278.935792@montanaro.dyndns.org> Message-ID: <464C3858.3030307@gmail.com> (relocating thread from python-3000-checkins) Brett Cannon wrote: > On 5/16/07, *skip at pobox.com * > wrote: > Brett> Strop should go when the string module goes. I don't > remember > Brett> where the last "let's kill string but what do we do about > the few > Brett> useful things in there" conversation went. > > Sorry, I don't read the py3k list (but see checkins). What about > the few > bits of string that have no obvious other place to live (lowercase, > digits, > etc)? Do they somehow become attributes of the str class? > > That undecided at the moment. Guido killed strop as there is a Python > implementation so it doesn't affect how to handle the string module. As > of this moment no decision has been made whether to keep 'string' or to > kill it. To be honest, I have never understood the repeated proposals to get rid of the string module. Get rid of the functions that are just duplicates of str methods, sure, but the module makes sense to me as a home for text related constants and other machinery (such as string.Template and the various building blocks for more advanced PEP 3101 based formatting). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From santagada at gmail.com Thu May 17 14:36:14 2007 From: santagada at gmail.com (Leonardo Santagada) Date: Thu, 17 May 2007 09:36:14 -0300 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <11fab4bc0705170235m3d9d50fg2aab33eb712f05b0@mail.gmail.com> References: <464BBE2B.1050201@acm.org> <464C1787.7090209@v.loewis.de> <11fab4bc0705170227s1925cee2j2181c45b7772d9d7@mail.gmail.com> <11fab4bc0705170235m3d9d50fg2aab33eb712f05b0@mail.gmail.com> Message-ID: <0EE9E992-F418-4E2A-872D-7B2CE012FAC3@gmail.com> Here are the rules for identifiers in javascript in case someone wants to know: http://interglacial.com/javascript_spec/a-7.html#a-7.6 -- Leonardo Santagada santagada at gmail.com From aahz at pythoncraft.com Thu May 17 15:06:39 2007 From: aahz at pythoncraft.com (Aahz) Date: Thu, 17 May 2007 06:06:39 -0700 Subject: [Python-3000] Whither string? (was Re: python/branches/py3k-struni/Lib/test/test_strop.py) In-Reply-To: <464C3858.3030307@gmail.com> References: <20070515214221.787161E4012@bag.python.org> <17995.29840.132278.935792@montanaro.dyndns.org> <464C3858.3030307@gmail.com> Message-ID: <20070517130639.GA20958@panix.com> On Thu, May 17, 2007, Nick Coghlan wrote: > > To be honest, I have never understood the repeated proposals to get > rid of the string module. Get rid of the functions that are just > duplicates of str methods, sure, but the module makes sense to me > as a home for text related constants and other machinery (such as > string.Template and the various building blocks for more advanced PEP > 3101 based formatting). The trend in support seems to be toward moving everything left that is useful from "string" to "text", which would be a package. Overall, I'm +1 on that idea. I can see arguments in favor of leaving string, but that name just has too much baggage. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet From benji at benjiyork.com Thu May 17 15:15:28 2007 From: benji at benjiyork.com (Benji York) Date: Thu, 17 May 2007 09:15:28 -0400 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> <4648D626.1030201@benjiyork.com> <464B4237.4090802@benjiyork.com> Message-ID: <464C5570.5050205@benjiyork.com> Guido van Rossum wrote: > On 5/16/07, Benji York wrote: >> Guido van Rossum wrote: >>> On 5/14/07, Benji York wrote: >>>> Collin Winter wrote: >>>>> PEP: 3133 >>>>> Title: Introducing Roles >>>> Everything included here is included in zope.interface. See in-line >>>> comments below for the analogs. >>> Could you look at PEP 3119 and do a similar analysis? >> Sure. And here it is: > PEP: 3119 > Title: Introducing Abstract Base Classes I've placed my comments in-line and snipped chunks of the original PEP where it seemed appropriate. > Version: $Revision$ > Last-Modified: $Date$ > Author: Guido van Rossum , Talin > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 18-Apr-2007 > Post-History: 26-Apr-2007, 11-May-2007 [snip] > Rationale > ========= > > In the domain of object-oriented programming, the usage patterns for > interacting with an object can be divided into two basic categories, > which are 'invocation' and 'inspection'. > > Invocation means interacting with an object by invoking its methods. > Usually this is combined with polymorphism, so that invoking a given > method may run different code depending on the type of an object. > > Inspection means the ability for external code (outside of the > object's methods) to examine the type or properties of that object, > and make decisions on how to treat that object based on that > information. > > Both usage patterns serve the same general end, which is to be able to > support the processing of diverse and potentially novel objects in a > uniform way, but at the same time allowing processing decisions to be > customized for each different type of object. > > In classical OOP theory, invocation is the preferred usage pattern, > and inspection is actively discouraged, being considered a relic of an > earlier, procedural programming style. However, in practice this view > is simply too dogmatic and inflexible, and leads to a kind of design > rigidity that is very much at odds with the dynamic nature of a > language like Python. I disagree with the last sentance in the above paragraph. While zope.interface has been shown (in a seperate message) to perform the same tasks as the "rolls" PEP (3133) and below I show the similarities between this PEP (ABCs) and zope.interface, I want to point out that users of zope.interface don't actually use it in these ways. So, what /do/ people use zope.interface for? There are two primary uses: making contracts explicit and adaptation. If more detail is desired about these uses; I'll be glad to share. My main point is that the time machine worked; people have had the moral equivalent of ABCs and Roles for years and have decided against using them the way the PEPs envision. Of course if people still think ABCs are keen, then a stand-alone package can be created and we can see if there is uptake, if so; it can be added to the standard library later. If I recall correctly, the original motivation for ABCs was that some times people want to "sniff" an object and see what it is, almost always to dispatch appropriately. That use case of "dispatch in the small", would seem to me to be much better addressed by generic functions. If those generic functions want something in addition to classes to dispatch on, then interfaces can be used too. If GF aren't desirable for that use case, then basefile, basesequence, and basemapping can be added to Python and cover 90% of what people need. I think the Java Collections system has shown that it's not neccesary to provide all interfaces for all people. If you can only provide a subset of an interface, make unimplemented methods raise NotImplementedError. [snip] > Overloading ``isinstance()`` and ``issubclass()`` > ------------------------------------------------- Perhaps the PEP should just be reduced to include only this section. [snip] > The ``abc`` Module: an ABC Support Framework > -------------------------------------------- [snip] > These methods are intended to be be called on classes whose metaclass > is (derived from) ``ABCMeta``; for example:: > > from abc import ABCMeta import zope.interface > class MyABC(metaclass=ABCMeta): > pass class MyInterface(zope.interface.Interface): pass > MyABC.register(tuple) zope.interface.classImplements(tuple, MyInterface) > assert issubclass(tuple, MyABC) assert MyInterface.implementedBy(tuple) > assert isinstance((), MyABC) assert MyInterface.providedBy(()) > The last two asserts are equivalent to the following two:: > > assert MyABC.__subclasscheck__(tuple) > assert MyABC.__instancecheck__(()) > > Of course, you can also directly subclass MyABC:: > > class MyClass(MyABC): > pass class MyClass: zope.interface.implements(MyInterface) > assert issubclass(MyClass, MyABC) assert MyInterface.implementedBy(MyClass) > assert isinstance(MyClass(), MyABC) assert MyInterface.providedBy(MyClass()) > Also, of course, a tuple is not a ``MyClass``:: > > assert not issubclass(tuple, MyClass) > assert not isinstance((), MyClass) > > You can register another class as a subclass of ``MyClass``:: > > MyClass.register(list) There is an interface that MyClass implements that list implements as well. class MyClassInterface(MyInterface): pass zope.interface.classImplements(list, MyClassInterface) Sidebar: this highlights one of the reasons zope.interface users employ the naming convention of prefixing their interface names with "I", it helps keep interface names short while giving you an easy name for "interface that corresponds to things of class Foo", which would be IFoo. > assert issubclass(list, MyClass) assert MyClassInterface.implementedBy(list) > assert issubclass(list, MyABC) assert MyClassInterface.extends(MyInterface) > You can also register another ABC:: > > class AnotherClass(metaclass=ABCMeta): > pass class AnotherInterface(zope.interface.Interface): pass > AnotherClass.register(basestring) zope.interface.classImplements(basestring, AnotherInterface) > MyClass.register(AnotherClass) I don't quite understand the intent of the above line. It appears to be extending the contract that AnotherClass embodies to promise to fulfill any contract that MyClass embodies. That seems to be an unusual thing to want to express. Although unusual, you could still do it using zope.interface. One way would be to add MyClassInterface to the __bases__ of AnotherInterface. OTOH, I might be confused by the colapsing of the class and interface hierarchies. Do the classes in the above line of code represent the implementation or specification? [snip] > ABCs for Containers and Iterators > --------------------------------- zope.interface defines similar interfaces. Surprisingly they aren't used all that often. They can be viewed at http://svn.zope.org/zope.interface/trunk/src/zope/interface/common/. The files mapping.py, sequence.py, and idatetime.py are the most interesting. [snip rest] > I was just > thinking of how to "sell" ABCs as an alternative to current happy > users of zop.interfaces. One of the things that makes zope.interface users happy is the separation of specification and implementation. The increasing separation of specification from implementation is what has driven Abstract Data Types in procedural languages, encapsulation in OOP, and now zope.interface. Mixing the two back together in ABCs doesn't seem attractive. As for "selling" current users on an alternative, why bother? If people need interfaces, they know where to find them. I suspect I'm confused as to the intent of this discussion. -- Benji York http://benjiyork.com From martin at v.loewis.de Thu May 17 15:49:31 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 May 2007 15:49:31 +0200 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: References: <20070514113240.19381.1581980774.divmod.quotient.32995@ohm> Message-ID: <464C5D6B.6070302@v.loewis.de> > Does the tokenizer do this for all string literals, too? Otherwise you > could still get surprises with things like x.foo vs. getattr(x, > "foo"), if the name foo were normalized but the string "foo" were not. No. If you use a string literal, chances are very high that you put NFC into your source code file (if it's not UTF-8, most codecs will produce NFC naturally; if it is UTF-8, it depends on your editor). If you get the attribute name from elsewhere, it's a design choice of who should perform the normalization. One could specify that builtin getattr does that, or one could require that the application does it in cases where the strings aren't guaranteed to be in NFC. The only case where I know of a software that explicitly changes the normalization, and not to NFC, is OSX, which uses NFD on disk. Regards, Martin From fdrake at acm.org Thu May 17 16:26:37 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 17 May 2007 10:26:37 -0400 Subject: [Python-3000] =?iso-8859-1?q?r55359_-=09python/branches/py3k-stru?= =?iso-8859-1?q?ni/Lib/test/test=5Fstrop=2Epy?= In-Reply-To: <464C3858.3030307@gmail.com> References: <20070515214221.787161E4012@bag.python.org> <464C3858.3030307@gmail.com> Message-ID: <200705171026.37851.fdrake@acm.org> On Thursday 17 May 2007, Nick Coghlan wrote: > To be honest, I have never understood the repeated proposals to get rid > of the string module. Get rid of the functions that are just duplicates > of str methods, sure, but the module makes sense to me as a home for > text related constants and other machinery (such as string.Template and > the various building blocks for more advanced PEP 3101 based formatting). Agreed. I see no need to add another name to the stdlib as a place to store those values, and placing them on str doesn't seem particularly attractive (especially for 2.x). -Fred -- Fred L. Drake, Jr. From tomerfiliba at gmail.com Thu May 17 16:41:16 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Thu, 17 May 2007 16:41:16 +0200 Subject: [Python-3000] pep 3131 again In-Reply-To: <464C1EF4.6040603@v.loewis.de> References: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> <464C1EF4.6040603@v.loewis.de> Message-ID: <1d85506f0705170741t653cbc2fx74dadbfa2cf44303@mail.gmail.com> well, i still don't see what problems having that would solve. it seems like just "a cool feature" people want to have. they will still need to use latin text/english docs most of the time. on the other i don't see a reason to limit them intentionally. if that would keep them content/make the transition easier/help them learn programming, i'd guess there's nothing wrong with that. so i'm not enthused about it all, but i'll give that +0 -tomer On 5/17/07, "Martin v. L?wis" wrote: > > > === help people who can't type english === > > since the keywords remain ASCII, along with stdlib and all other major > > third party libs -- how does that help the english-illiterate > programmer? > > english-illiterate and "can't type english" are very different things. > By "can't type english", I assume you mean "can't type Latin > characters". These users are not helped at all by this PEP, but I think > they are really rare, since keyboards commonly support a mode to enter > Latin characters (perhaps after pressing some modifier key, or switching > to Latin mode). > > > > > import random > > ?? = range(100) > > random.shuffle(?? ) > > ? = ??.pop(7) > > if len(?) > 58: > > print "?????!!!" # ?? ?? ??????? ???? ??? ????? > > > > apart from excessive visual noise, the amount of *latin* identifiers and > > keywords is not negligible. > > Right. However, you don't have to understand *English* to write or read > this text. You don't need to know that "import" means "to bring from a > foreign or external source", and that "shuffle" means "to mix in a mass > confusedly". Instead, understanding them by their Python meaning is > enough. > > > if all you're trying to save is coming up with > > english names for your functions, than that's okay, but saying > > "japanese people have a hard time coding in the latin alphabet" > > does not withstand practical usage. > > Coming up with English names is not necessary today. Coming up > with Latin spellings is. > > Whether or not Japanese or Chinese people with no knowledge of > English still can master the Latin alphabet easily, I don't know, > as all Chinese people I do know speak German or English well. > > I would say "they can speak for themselves", except that then > neither of us would understand them. > > Regards, > Martin > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070517/8c41cb79/attachment.htm From martin at v.loewis.de Thu May 17 17:22:19 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 May 2007 17:22:19 +0200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <464BBE2B.1050201@acm.org> References: <464BBE2B.1050201@acm.org> Message-ID: <464C732B.8050103@v.loewis.de> > While there has been a lot of discussion as to whether to accept PEP > 3131 as a whole, there has been little discussion as to the specific > details of the PEP. In particular, is it generally agreed that the > Unicode character classes listed in the PEP are the ones we want to > include in identifiers? My preference is to be conservative in terms of > what's allowed. John Nagle suggested to consider UTR#39 (http://unicode.org/reports/tr39/). I encourage anybody to help me understand what it says. The easiest part is 3.1: this seems to say we should restrict characters listed as "restrict" in [idmod]. My suggestion would be to warn about them. I'm not sure about the purpose of the additional characters: surely, they don't think we should support HYPHEN-MINUS in identifiers? 4. Confusable Detection: Without considering details, it seems you need two strings to decide whether they are confusable. So it's not clear to me how this could apply to banning certain identifiers. 5. Mixed Script Detection: That might apply, but I can't map the algorithm to terminology I'm familiar with. What is UScript.COMMON and UScript.INHERITED? I'm skeptical about mixed-script detection, because you surely want to allow ASCII digits (0..9) in Cyrillic identifiers - not sure whether the detection would claim that the digits are Latin (which they aren't - they are Arabic numbers). So a precise algorithm in Python (using unicodedata) would be helpful. I still would like to make that produce a warning only; users more concerned about phishing could turn the warning into an error. Regards, Martin From martin at v.loewis.de Thu May 17 17:28:14 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 May 2007 17:28:14 +0200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <0EE9E992-F418-4E2A-872D-7B2CE012FAC3@gmail.com> References: <464BBE2B.1050201@acm.org> <464C1787.7090209@v.loewis.de> <11fab4bc0705170227s1925cee2j2181c45b7772d9d7@mail.gmail.com> <11fab4bc0705170235m3d9d50fg2aab33eb712f05b0@mail.gmail.com> <0EE9E992-F418-4E2A-872D-7B2CE012FAC3@gmail.com> Message-ID: <464C748E.7020209@v.loewis.de> Leonardo Santagada schrieb: > Here are the rules for identifiers in javascript in case someone > wants to know: > http://interglacial.com/javascript_spec/a-7.html#a-7.6 In all these reports, part of the analysis is also to determine how those specifications deviate (or not) from PEP 3131. In this case, it seems that: - JS additionally allows $ - PEP 3131 additionally considers the stability extensions (Other_ID_{Start|Continue}). That may be because the JS specification was based on an earlier version of UAX#31, which perhaps didn't had the need for this stability feature. - JS uses Unicode 3.0, whereas Python uses whatever Unicode version lives in unicodedata. Regards, Martin From janssen at parc.com Thu May 17 17:48:18 2007 From: janssen at parc.com (Bill Janssen) Date: Thu, 17 May 2007 08:48:18 PDT Subject: [Python-3000] Whither string? (was Re: python/branches/py3k-struni/Lib/test/test_strop.py) In-Reply-To: <20070517130639.GA20958@panix.com> References: <20070515214221.787161E4012@bag.python.org> <17995.29840.132278.935792@montanaro.dyndns.org> <464C3858.3030307@gmail.com> <20070517130639.GA20958@panix.com> Message-ID: <07May17.084825pdt."57996"@synergy1.parc.xerox.com> > The trend in support seems to be toward moving everything left that is > useful from "string" to "text", which would be a package. Sigh. After 15 years of carefully writing python code using "text" instead of "string" as a variable name, I'm sure this will work out just fine. :-) Bill From g.brandl at gmx.net Thu May 17 17:54:55 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 17 May 2007 17:54:55 +0200 Subject: [Python-3000] Whither string? (was Re: python/branches/py3k-struni/Lib/test/test_strop.py) In-Reply-To: <07May17.084825pdt."57996"@synergy1.parc.xerox.com> References: <20070515214221.787161E4012@bag.python.org> <17995.29840.132278.935792@montanaro.dyndns.org> <464C3858.3030307@gmail.com> <20070517130639.GA20958@panix.com> <07May17.084825pdt."57996"@synergy1.parc.xerox.com> Message-ID: Bill Janssen schrieb: >> The trend in support seems to be toward moving everything left that is >> useful from "string" to "text", which would be a package. > > Sigh. After 15 years of carefully writing python code using "text" > instead of "string" as a variable name, I'm sure this will work out > just fine. :-) As long as we don't find other functions/classes that warrant the name "text", I propose to keep the string module as it is. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From collinw at gmail.com Thu May 17 18:37:39 2007 From: collinw at gmail.com (Collin Winter) Date: Thu, 17 May 2007 09:37:39 -0700 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy for Numbers In-Reply-To: <464C34C4.2080702@gmail.com> References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> <464C06E3.2090104@acm.org> <464C34C4.2080702@gmail.com> Message-ID: <43aa6ff70705170937u19113f3et9f23971448049c0e@mail.gmail.com> On 5/17/07, Nick Coghlan wrote: > Talin wrote: > > This really highlights what I think is a problem with dynamic > > inheritance, and I think that this inconsistency between traditional and > > dynamic inheritance will eventually come back to haunt us. It has always > > been the case in the past that for every property of class B, if > > isinstance(A, B) == True, then A also has that property, either > > inherited from B, or overridden in A. The fact that this invariant will > > no longer hold true is a problem in my opinion. > > > > I realize that there isn't currently a solution to efficiently allow > > inheritance of properties via dynamic inheritance. As a software > > engineer, however, I generally feel that if a feature is unreliable, > > then it shouldn't be used at all. So if I were designing a class > > hierarchy of ABCs, I would probably make a rule for myself not to define > > any properties or methods in the ABCs at all, and to *only* use ABCs for > > type testing via 'isinstance'. > > If a class doesn't implement the interface defined by an ABC, you should > NOT be registering it with that ABC via dynamic inheritance. *That's* > the bug - the program is claiming that "instances of class A can be > treated as if they were an instance of B" when that statement is simply > not true. And without defining an interface, dispatching on the ABC is > pointless - you don't know whether or not you support the operations > implied by that ABC because there aren't any defined! ABCs can define concrete methods. These concrete methods provide functionality that the child classes do not themselves provide. Let's imagine that Python didn't have the readlines() method, and that I wanted to define one. I could create an ABC that provides a default concrete implementation of readlines() in terms of readline(). class ReadlinesABC(metaclass=ABCMeta): def readlines(self): # some concrete implementation @abstractmethod def readline(self): pass If I register a Python-language class as implementing this ABC, "isinstance(x, ReadlinesABC) == True" means that I can now call the readlines() method. However, if I register a C-language extension class as implementing this ABC, "isinstance(x, ReadlinesABC) == True" may or may not indicate that I can call readlines(), making the test of questionable value. You can say that I shouldn't have registered a C extension class with this ABC in the first place, but that's not the point. The point is that for consumer code "isinstance(x, ReadlinesABC) == True" is an unreliable test that may or may not accurately reflect the object's true capabilities. Maybe attempting to use partially-concrete ABCs in tandem with C classes should raise an exception. That would make this whole issue go away. Collin Winter From guido at python.org Thu May 17 18:48:27 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 17 May 2007 09:48:27 -0700 Subject: [Python-3000] PEP 3131 accepted Message-ID: I have accepted PEP 3131. Note that it now contains the following policy: """ As an addition to the Python Coding style, the following policy is prescribed: All identifiers in the Python standard library MUST use ASCII-only identifiers, and SHOULD use English words wherever feasible (in many cases, abbreviations and technical terms are used which aren't English). In addition, string literals and comments must also be in ASCII. The only exceptions are (a) test cases testing the non-ASCII features, and (b) names of authors. Authors whose names are not based on the latin alphabet MUST provide a latin transliteration of their names. """ I recommend that open source projects with a global audience adopt a similar policy. I'll also add it to PEP 8. I expect that small details of the PEP will still change as discussion about these takes place and as implementation is undertaken. This does not affect my acceptance of the PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu May 17 19:08:11 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 17 May 2007 10:08:11 -0700 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy for Numbers In-Reply-To: <43aa6ff70705170937u19113f3et9f23971448049c0e@mail.gmail.com> References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> <464C06E3.2090104@acm.org> <464C34C4.2080702@gmail.com> <43aa6ff70705170937u19113f3et9f23971448049c0e@mail.gmail.com> Message-ID: On 5/17/07, Collin Winter wrote: > ABCs can define concrete methods. These concrete methods provide > functionality that the child classes do not themselves provide. You seem to be misreading my intention here. ABCs serve two purposes: they are interface specifications, and they provide "default" or "mix-in" implementations of some of the methods they specify. The pseudo-inheritance enabled by the register() call uses only the specification part, and requires that the registered class implement all the specified methods itself. In order to benefit from the "mix-in" side of the ABC, you must subclass it directly. > Let's > imagine that Python didn't have the readlines() method, and that I > wanted to define one. I could create an ABC that provides a default > concrete implementation of readlines() in terms of readline(). > > class ReadlinesABC(metaclass=ABCMeta): > def readlines(self): > # some concrete implementation > > @abstractmethod > def readline(self): > pass > > If I register a Python-language class as implementing this ABC, > "isinstance(x, ReadlinesABC) == True" means that I can now call the > readlines() method. However, if I register a C-language extension > class as implementing this ABC, "isinstance(x, ReadlinesABC) == True" > may or may not indicate that I can call readlines(), making the test > of questionable value. > > You can say that I shouldn't have registered a C extension class with > this ABC in the first place, but that's not the point. No, it is *exactly* the point. If you want to have functionality that is *not* provided by some class, you should use an adaptor. > The point is > that for consumer code "isinstance(x, ReadlinesABC) == True" is an > unreliable test that may or may not accurately reflect the object's > true capabilities. > > Maybe attempting to use partially-concrete ABCs in tandem with C > classes should raise an exception. That would make this whole issue go > away. The register() call could easily verify that the registered class implements the specified set of methods -- this would include methods that are concrete for the benefit of direct subclassing. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From foom at fuhm.net Thu May 17 19:13:57 2007 From: foom at fuhm.net (James Y Knight) Date: Thu, 17 May 2007 13:13:57 -0400 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <464C1C22.7030806@v.loewis.de> References: <464BBE2B.1050201@acm.org> <69B09BFE-3BF3-4532-98EA-8A7E44461D77@fuhm.net> <464C1C22.7030806@v.loewis.de> Message-ID: <1192E673-212B-4C5C-AB1F-31EBA657DE4E@fuhm.net> On May 17, 2007, at 5:10 AM, Martin v. L?wis wrote: >> This list is available as part of the PropList.txt file in the >> unicode data, which ought to be included automatically in python's >> unicode database so as to get future changes. > > This I'm not so sure about. I changed the PEP to say that > Other_ID_{Start|Continue} should be included. Whether the other > properties should be added to the unidata module, I don't know - > I would like to see use cases first before including them. I only meant that the python's idea of Other_ID_* should be automatically generated from the unicode data file, so that when someone upgrades python's database to Unicode 5.1 (or whatever), they don't forget to update a manually copied Other_ID_* list as well. >> I do not believe it is a good idea for python to define its own >> identifier rules. The rules defined in UAX31 make sense and should be >> used directly, with only the minor amendment of _ as an allowable >> start character. > > That was my plan indeed. I was voicing my support for your plan, in contrast to Talin's comment that perhaps a more conservative subset would be good. James From foom at fuhm.net Thu May 17 19:28:40 2007 From: foom at fuhm.net (James Y Knight) Date: Thu, 17 May 2007 13:28:40 -0400 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <464C732B.8050103@v.loewis.de> References: <464BBE2B.1050201@acm.org> <464C732B.8050103@v.loewis.de> Message-ID: <12AAEFAC-9E5A-4954-89DB-7A195A8E64A4@fuhm.net> On May 17, 2007, at 11:22 AM, Martin v. L?wis wrote: >> While there has been a lot of discussion as to whether to accept PEP >> 3131 as a whole, there has been little discussion as to the specific >> details of the PEP. In particular, is it generally agreed that the >> Unicode character classes listed in the PEP are the ones we want to >> include in identifiers? My preference is to be conservative in >> terms of >> what's allowed. > > John Nagle suggested to consider UTR#39 > (http://unicode.org/reports/tr39/). I encourage anybody to help me > understand what it says. I think this is not something that is appropriate for Python. It looks fairly specific to implementing a centralized name registry (say: DNS). Specifically, the backwards compatibility is not appropriate, as it doesn't guarantee that a name valid now will be valid in the future. They point out that that is okay for DNS, where the rules can be applied at name-registration time, and previously- registered names can continue to be used. James From martin at v.loewis.de Thu May 17 19:43:50 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 May 2007 19:43:50 +0200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <12AAEFAC-9E5A-4954-89DB-7A195A8E64A4@fuhm.net> References: <464BBE2B.1050201@acm.org> <464C732B.8050103@v.loewis.de> <12AAEFAC-9E5A-4954-89DB-7A195A8E64A4@fuhm.net> Message-ID: <464C9456.2000805@v.loewis.de> > I think this is not something that is appropriate for Python. It looks > fairly specific to implementing a centralized name registry (say: DNS). > Specifically, the backwards compatibility is not appropriate, as it > doesn't guarantee that a name valid now will be valid in the future. > They point out that that is okay for DNS, where the rules can be applied > at name-registration time, and previously-registered names can continue > to be used. Right - that would be a reason to not ban identifiers that are considered questionable. Issuing a warning might be possible, though: if an identifier is warned about that wasn't warned about before, the program would still run. It turns out that John Nagle had a different spec in mind, though: Level 2 (Highly Restrictive) from http://unicode.org/reports/tr36/#Security_Levels_and_Alerts I think that is way too restrictive for programming languages, as it would ban combining cyrillic letters with ASCII digits, 2.10.2.B.1 of TR#36 recommends to use the general profile from UTS-39; 2.10.2.B.2 recommends to use NFKC and case-folding for identifier comparison - that, again, can't apply to Python as the language is case-sensitive. Regards, Martin From guido at python.org Thu May 17 19:53:42 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 17 May 2007 10:53:42 -0700 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy for Numbers In-Reply-To: <464C06E3.2090104@acm.org> References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> <464C06E3.2090104@acm.org> Message-ID: On 5/17/07, Talin wrote: > Lets therefore assume that the numeric ABCs will use this new > inheritance mechanism, avoiding the problem of taking an immature class > hierarchy and setting it in stone. The PEPs in this class would then no > longer need to have this privileged status; They could be replaced and > changed at will. > > Assuming that this is true, the question then becomes whether these > classes should be treated like any other standard library submission. In > other words, shouldn't this PEP be implemented as a separate module, and > have to prove itself 'in the wild' before being adopted into the stdlib? > Does this PEP even need to be a PEP at all, or can it just be a > 3rd-party library that is eventually adopted into Python? No; I think there's a lot of synergy to be had by making it a standard library module. For example, the Complex, Real and Integer types provide a common ground for the built-in types and the types implemented in numpy. Assuming (some form of) PEP 3124 is accepted, it would be a shame if we had to specialize GFs on concrete types like int or float. If we could encourage the habit right from the start to use the abstract classes in such positions, then numpy integration would be much easier. > Now, I *could* see adopting an untried library embodying untested ideas > into the stdlib if there was a crying need for the features of such a > library, and those needs were clearly being unfulfilled. However, I am > not certain that this is the case here. The ideas here are hardly untested; the proposed hierarchy is taught in high school math if not before, and many other languages use it (e.g. Scheme's numeric tower, referenced in the PEP). Some implementation details are untested, but I doubt that the general idea sparks much controversy (it hasn't so far). > At the very least, I think it should be stated in the PEP whether or not > the ABCs defined here are going to be using traditional or dynamic > inheritance. Dynamic inheritance for sure. > If it is the latter, and we decide that this PEP is going to be part of > the stdlib, then I propose the following library organization: > > import abc # Imports the basic ABC mechanics > import abc.collections # MutableSequence and such > import abc.math # The number hierarchy > ... and so on I don't like the idea of creating a ghetto for ABCs. Just like the ABCs for use with I/O are defined in the io module (read PEP 3161 and notice that it already uses a thinly disguised form of ABCs), the ABCs for collections should be in the existing collections module. I'm not sure where to place the numeric ABCs, but I'd rather have a top-level numbers module. > Now, there is another issue that needs to be dicussed. > The classes in the PEP appear to be written with lots of mixin methods, > such as __rsub__ and __abs__ and such. Unfortunately, the current > proposed method for dynamic inheritance does not allow for methods or > properties to be inherited from the 'virtual' base class. Which means > that all of the various methods defined in this PEP are utterly > meaningless other than as documentation - except in the case of a new > user-created class of numbers which inherit from these ABCs using > traditional inheritance, which is not something that I expect to happen > very often at all. For virtually all practical uses, the elaborate > methods defined in this PEP will be unused and inaccessible. And yet they are designed for the benefit of new numeric type implementations so that mixed-mode arithmetic involving two different 3rd party implementations can be defined soundly. If this is deemed too controversial or unwieldy and unnecessary I'd be okay with dropping it, though I'm not sure that it hurts. There are some problems with specifying it so that all the cases work right, however. In any case we could have separate mix-in classes for this purpose -- while ABCs *can* be dual-purpose (both specification and mix-in), there's no rule that says the *must* be. The problem Jeffrey and I were trying to solve with this part of the spec is what should happen if you have two independently developed 3rd party types, e.g. MyReal and YourReal, both implementing the Real ABC. Obviously we want MyReal("3.5") + YourReal("4.5") to return an object for which isinstance(x, Real) holds and whose value is close to float("8.0"). But this is not so easy. Typically, classes like this have methods like these:: class MyReal: def __add__(self, other): if not isinstance(other, MyReal): return NotImplemented return MyReal(...) def __radd__(self, other): if not isinstance(other, MyReal): return NotImplemented return MyReal(...) but this doesn't support mixed-mode arithmetic at all. (Reminder of the underlying mechanism: for a+b, first a.__add__(b) is tried; if that returns NotImplemented, b.__radd__(a) is tried; if that also returns NotImplemented, the Python VM raises TypeError. Exceptions raised at any stage cause the remaining steps to be abandoned, so if e.g. __add__ raises TypeError, __radd__ is never tried. This is the crux of the problem we're trying to solve.) Supporting mixed-mode arithmetic with *known* other types like float is easy: just add either if isinstance(other, float): return self + MyReal(other) or if isinstance(other, float): return float(self) + other to the top of each method. But that still doesn't support MyReal() + YourReal(). For that to work, at least one of the two classes has to blindly attempt to cast the other argument to a known class. For example, instead of returning NotImplemented, __radd__ (which knows it is being called as a last resort) could return self+float(other), or float(self)+float(other), under the assumption that all 3rd party Real types can be converted to the built-in float type with only moderate loss. Unfortunately, there are a lot of cases to consider here. E.g. we could be adding MyComplex() to YourReal(), and then the float cast in __radd__ would be a disaster (since MyComplex() can't be cast to float, only to complex). We are trying to make things easier for 3rd parties that *do* want to use the ABC as a mix-in, by asking them to call super.__[r]add__(self, other) whenever the other argument is not something they specifically recognize. I'm pretty sure that we haven't gotten the logic for this quite right yet. I'm only moderately sure that we *can* get it right. But in any case, please don't let this distract you from the specification part of the numeric ABCs. > This really highlights what I think is a problem with dynamic > inheritance, and I think that this inconsistency between traditional and > dynamic inheritance will eventually come back to haunt us. It has always > been the case in the past that for every property of class B, if > isinstance(A, B) == True, then A also has that property, either > inherited from B, or overridden in A. The fact that this invariant will > no longer hold true is a problem in my opinion. Actually there is a school of thought (which used to prevail amongst Zopistas, I don't know if they've been cured yet) that class inheritance was purely for implementation inheritance, and that a subclass was allowed to reverse policies set by the base class. While I don't endorse this as a general rule, I don't see how (with a sufficiently broad definition of "property") your invariant can be assumed even for traditional inheritance. E.g. a base class may be hashable or or immutable but the subclass may not be (it's trivial to create mutable subclasses of int, str or tuple). > I realize that there isn't currently a solution to efficiently allow > inheritance of properties via dynamic inheritance. As a software > engineer, however, I generally feel that if a feature is unreliable, > then it shouldn't be used at all. That sounds like a verdict against all dynamic properties of the language. > So if I were designing a class > hierarchy of ABCs, I would probably make a rule for myself not to define > any properties or methods in the ABCs at all, and to *only* use ABCs for > type testing via 'isinstance'. And that is a fine policy to hold yourself to. > In other words, if I were writing this PEP, Hey, you are! Or do you want your name taken off? At the very least I think the rationale (all your words) needs some ervision in the light of Benji York's comments in another thread. > all of those special methods > would be omitted, simply because as a writer of a subclass I couldn't > rely on being able to use them. As the author of a subclass, you control the choice between real inheritance via inclusion in __bases__ or pseudo-inheritance via register(). So I don't see your point here. > The only alternative that I can see is to not use dynamic inheritance at > all, and instead have the number classes inherit from these ABCs using > the traditional mechanism. But that brings up all the problems of > immaturity and requiring them to be built-in that I brought up earlier. That's off the table already. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jason.orendorff at gmail.com Thu May 17 19:55:57 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Thu, 17 May 2007 13:55:57 -0400 Subject: [Python-3000] pep 3131 again In-Reply-To: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> References: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> Message-ID: Martin, this message suggests an addition to PEP 3131. On 5/16/07, tomer filiba wrote: > === RTL/LTR === > the only practical way to use RTL languages in code is to have an RTL > programming language, where "if" is spelled "??", "for" as "????", > "in" as "????", and so on, and the entire program is RTL. having code > like -- > > for ??? in ????(1,2,3) > > is only unreadable by all means (since the parenthesis are LTR, while > the name is RTL, etc.) In theory, the Right Thing to do for this is support Unicode bidi format control characters. Check this out: for ??? in ?????(1,2,3): blort(???) I just added U+200E, "LEFT-TO-RIGHT MARK", after each misbehaving RTL identifier, as recommended here: http://unicode.org/reports/tr9/#Usage Note: some mail/news agents strip out format characters. (?.gnikrow era sretcarahc lortnoc idib ,siht daer nac uoy fI??) (?If you can read this, control characters were stripped/ignored.??) Now... it's clearly absurd to be pasting invisible magic characters into source code, but that part is automatable. Just hack your editor to add U+200E after each run of strong-RTL characters, except in strings and comments. The real problems are: 1. Many editors don't have bidi support. This might improve with time. Or not. 2. Python forbids these characters. Martin, JavaScript treats these specially, and I think Python probably should, too: The ECMAScript 3 standard for JavaScript requires the tokenizer to throw away all Unicode format-control characters (general category Cf). ECMAScript 4 will likely tweak this (an incompatible change) to retain those characters only in strings and regexps. I like that better. Cheers, -j From foom at fuhm.net Thu May 17 20:03:54 2007 From: foom at fuhm.net (James Y Knight) Date: Thu, 17 May 2007 14:03:54 -0400 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <464BBE2B.1050201@acm.org> References: <464BBE2B.1050201@acm.org> Message-ID: <2A4F5FE3-9F8A-4B74-B46D-B63F1260B7FD@fuhm.net> I mentioned this in another thread as an aside in the middle of the email, but I thought I'd put it out here at the top: It should be considered whether formatting characters should be ignored. And if so, which list of properties should be used for that. I notice that the excerpt from the C# standard says: > * 4 Any formatting-characters are removed. I don't know what they mean by that, but I'm going to guess characters in the Cf class. However, UAX #31 says: > 2.2 Layout and Format Control Characters > > Certain Unicode characters are used to control joining behavior, > bidirectional ordering control, and alternative formats for > display. These have the General_Category value of Cf. Unlike space > characters or other delimiters, they do not indicate word, line, or > other unit boundaries. > > While it is possible to ignore these characters in determining > identifiers, the recommendation is to not ignore them and to not > permit them in identifiers except in special cases. This is because > of the possibility for confusion between two visually identical > strings; see [UTR36]. Some possible exceptions are the ZWJ and ZWNJ > in certain contexts, such as between certain characters in Indic > words. It doesn't seem to me that an attack vector here is particularly relevant, so perhaps going along with C# and ignoring Cf characters in the source code might be a good idea. But I do notice that Unicode 4.0.1 and earlier used to recommend ignoring formatting characters in identifiers (Ch 5 of the book), so that might be where C# got it from. So, maybe it's better to keep the status quo, and not allow Cf characters, unless someone comes up with a particular need for doing so. Hm, I think I've convinced myself of that now. :) James From martin at v.loewis.de Thu May 17 20:22:13 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 May 2007 20:22:13 +0200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <2A4F5FE3-9F8A-4B74-B46D-B63F1260B7FD@fuhm.net> References: <464BBE2B.1050201@acm.org> <2A4F5FE3-9F8A-4B74-B46D-B63F1260B7FD@fuhm.net> Message-ID: <464C9D55.9080501@v.loewis.de> > So, maybe it's better to keep the status quo, and not allow Cf > characters, unless someone comes up with a particular need for doing so. > Hm, I think I've convinced myself of that now. :) That is my reasoning, too. People seem to want to be conservative, so it's safer to reject formatting characters for the moment. If people come up with a need, they still can be added. (there might be a need for it in RTL languages, supporting 200E..200F and 202A..202E, but it seems that speakers of RTL languages are skeptical about the entire PEP, so it's unclear whether allowing these would help anything) Regards, Martin From collinw at gmail.com Thu May 17 20:24:14 2007 From: collinw at gmail.com (Collin Winter) Date: Thu, 17 May 2007 11:24:14 -0700 Subject: [Python-3000] Updated and simplified PEP 3141: A Type Hierarchy for Numbers In-Reply-To: References: <5d44f72f0705161731j4700bdb3h4e36e97757bd6a32@mail.gmail.com> <464C06E3.2090104@acm.org> <464C34C4.2080702@gmail.com> <43aa6ff70705170937u19113f3et9f23971448049c0e@mail.gmail.com> Message-ID: <43aa6ff70705171124i63c3edc7j1d4e133bdce1ce4f@mail.gmail.com> On 5/17/07, Guido van Rossum wrote: > On 5/17/07, Collin Winter wrote: > > ABCs can define concrete methods. These concrete methods provide > > functionality that the child classes do not themselves provide. > > You seem to be misreading my intention here. ABCs serve two purposes: > they are interface specifications, and they provide "default" or > "mix-in" implementations of some of the methods they specify. The > pseudo-inheritance enabled by the register() call uses only the > specification part, and requires that the registered class implement > all the specified methods itself. In order to benefit from the > "mix-in" side of the ABC, you must subclass it directly. I think I'm getting confused between the PEP and what you've said at one of the various whiteboard sessions. From guido at python.org Thu May 17 20:36:27 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 17 May 2007 11:36:27 -0700 Subject: [Python-3000] PEP 3133: Introducing Roles In-Reply-To: <464C5570.5050205@benjiyork.com> References: <43aa6ff70705132236w6d2dbfc4r8d1fadede753a6ca@mail.gmail.com> <4648D626.1030201@benjiyork.com> <464B4237.4090802@benjiyork.com> <464C5570.5050205@benjiyork.com> Message-ID: On 5/17/07, Benji York wrote: > > [PEP 3119] > > In classical OOP theory, invocation is the preferred usage pattern, > > and inspection is actively discouraged, being considered a relic of an > > earlier, procedural programming style. However, in practice this view > > is simply too dogmatic and inflexible, and leads to a kind of design > > rigidity that is very much at odds with the dynamic nature of a > > language like Python. > > I disagree with the last sentance in the above paragraph. While > zope.interface has been shown (in a seperate message) to perform the > same tasks as the "rolls" PEP (3133) and below I show the similarities > between this PEP (ABCs) and zope.interface, I want to point out that > users of zope.interface don't actually use it in these ways. I'm not wedded to this sentence; the rationale didn't get a facelift like the rest of the PEP. I do want to point out that other mechanisms like GFs need to have access to isinstance/isclass or equivalent in order to do their magic. > So, what /do/ people use zope.interface for? There are two primary > uses: making contracts explicit and adaptation. If more detail is > desired about these uses; I'll be glad to share. I know what they are. Adaptation is another area where isinstance or equivalent is needed by the underlying machinery. I do note that a fairly common place where *some* kind of type checking (whether isinstance- or hasattr-based) is the implementation of binary operators; a typical __add__ or __radd__ method usually starts by testing whether the other argument is an object it understands, and if not, it returns NotImplemented. Since this is essentially *implementing* a (limited) GF strategy, I don't see how adaptation or GFs will help to eliminate such type checks. > My main point is that the time machine worked; people have had the moral > equivalent of ABCs and Roles for years and have decided against using > them the way the PEPs envision. It's not the only way the PEP envisions they are used, and no longer the major way. I expect that the uses you mention are actually more important. And given that people want these I think it would be useful to have them in the standard library. > Of course if people still think ABCs > are keen, then a stand-alone package can be created and we can see if > there is uptake, if so; it can be added to the standard library later. I want to add something to the standard library now, because it's been relegated to 3rd party status for too long. However, I think I can do better than zope.interface; in some cases I just disagree with its design choices, on other cases I can change the language to improve upon the contortions that zope had to go through to make things look right. (E.g. overloading isinstance, keyword arguments to class declarations, class decorators.) I am also leaving some of the more esoteric parts of zope.interface out (e.g. assertions about instances), but the mechanism I am proposing (isinstance overloading) supports this just fine. > If I recall correctly, the original motivation for ABCs was that some > times people want to "sniff" an object and see what it is, almost always > to dispatch appropriately. That use case of "dispatch in the small", > would seem to me to be much better addressed by generic functions. If > those generic functions want something in addition to classes to > dispatch on, then interfaces can be used too. That's just one motivation. Another, more important motivation is to have something that can fulfill the role that zope.interfaces fulfills, but with fewer arbitrary differences from what we already know, a class hierarchy. > If GF aren't desirable for that use case, then basefile, basesequence, > and basemapping can be added to Python and cover 90% of what people > need. I think the Java Collections system has shown that it's not > neccesary to provide all interfaces for all people. If you can only > provide a subset of an interface, make unimplemented methods raise > NotImplementedError. The ABC PEP currently defines fewer ABCs than the Java Collections system, so I'm not sure what you're worried about. > > Overloading ``isinstance()`` and ``issubclass()`` > > ------------------------------------------------- > > Perhaps the PEP should just be reduced to include only this section. And require every 3rd party library to invent its own notions of sequence, mapping etc.? No way! > Sidebar: this highlights one of the reasons zope.interface users employ > the naming convention of prefixing their interface names with "I", it > helps keep interface names short while giving you an easy name for > "interface that corresponds to things of class Foo", which would be > IFoo. Yeah, that is out of necessity because zope forces you to have a separate interface and implementation class. The ABC proposal does away with this silly duplicate hierarchy. > > assert issubclass(list, MyClass) > > assert MyClassInterface.implementedBy(list) > > > assert issubclass(list, MyABC) > > assert MyClassInterface.extends(MyInterface) > > > You can also register another ABC:: > > > > class AnotherClass(metaclass=ABCMeta): > > pass > > class AnotherInterface(zope.interface.Interface): > pass > > > AnotherClass.register(basestring) > > zope.interface.classImplements(basestring, AnotherInterface) > > > MyClass.register(AnotherClass) > > I don't quite understand the intent of the above line. It appears to be > extending the contract that AnotherClass embodies to promise to fulfill > any contract that MyClass embodies. That seems to be an unusual thing > to want to express. MyClass is meant to be an ABC here whose contract is weaker than that of AnotherClass. Suppose there was only a built-in MutableSequence, but a 3rd party needed a Sequence interface that didn't need to be mutable. It could use this example, with AnotherClass being the built-in MutableSequence, and MyClass being the 3rd party's Sequence class. > Although unusual, you could still do it using > zope.interface. One way would be to add MyClassInterface to the > __bases__ of AnotherInterface. > > OTOH, I might be confused by the colapsing of the class and interface > hierarchies. Do the classes in the above line of code represent the > implementation or specification? Specification. > [snip] > > > ABCs for Containers and Iterators > > --------------------------------- > > zope.interface defines similar interfaces. Surprisingly they aren't > used all that often. They can be viewed at > http://svn.zope.org/zope.interface/trunk/src/zope/interface/common/. > The files mapping.py, sequence.py, and idatetime.py are the most > interesting. I believe I worked for Zope Corp around the time these were designed. I remember it was pretty painful to agree on which methods to include or exclude. Having this decided by the standard library would solve the problem by fiat for a larger audience. > [snip rest] > > > I was just > > thinking of how to "sell" ABCs as an alternative to current happy > > users of zope.interfaces. > > One of the things that makes zope.interface users happy is the > separation of specification and implementation. The increasing > separation of specification from implementation is what has > driven Abstract Data Types in procedural languages, encapsulation > in OOP, and now zope.interface. Mixing the two back together in > ABCs doesn't seem attractive. This is a self-selecting audience -- people who see it differently (like me) aren't likely to use zope.interface, or might be hoping for something better. > As for "selling" current users on an alternative, why bother? If people > need interfaces, they know where to find them. I suspect I'm confused > as to the intent of this discussion. Many people have a strong preference for standard features over 3rd party features, everything else being equal -- or even if everything else isn't quite equal (but mostly so). A feature-by-feature comparison between zope.interface and ABCs is helpful for people who aren't yet current zope.interface users. it seems that in the balance, we have: - ABCs have support in the standard library - ABCs don't force you to separate specification from implementation. This counts as a pro for some, a con for others - ABCs (out of the box) don't let you make assertions about instances. This is a con for those who need that feature, but I haven't met many of those. - With ABCs, the spelling of "does this object conform to this interface" and "does this object inherit implementation from this class" is the same (isinstance()). This counts as a pro for some, a con for others. IMO it's mostly emotional and there is no rational reason to worry about the spelling being the same. I won't be offended if the authors and users of zope.interface decide to ignore the ABCs in the standard library; the rest of us will still benefit from them (and from GFs). However, I fully expect that zope.interface will embrace at least some of the mechanisms added to Py3k in support of ABCs, as they make implementing zope.interface easier too. Also, if we add standard GFs, zope.interface will want to work closely with those. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Thu May 17 20:42:56 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 17 May 2007 20:42:56 +0200 Subject: [Python-3000] pep 3131 again In-Reply-To: References: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> Message-ID: <464CA230.3000004@v.loewis.de> > 2. Python forbids these characters. Martin, JavaScript > treats these specially, and I think Python probably > should, too: > > The ECMAScript 3 standard for JavaScript requires the > tokenizer to throw away all Unicode format-control characters > (general category Cf). > > ECMAScript 4 will likely tweak this (an incompatible change) > to retain those characters only in strings and regexps. > I like that better. I've added this as an open issue. It would be easy to add, but I would like to get some confirmation first that it actually helps writers of the RTL languages (preferably from some native speakers). The proposed change would be that Cf characters would be allowed *only* in and immediately around identifiers, and in string literals and comments, i.e. the scanner would work this way: - perform token classification only based on individual ASCII letters; classify all non-ASCII letters as potential identifiers. - for identifiers potential identifiers (i.e. runs of non-ASCII characters and ASCII letters, digits, and underscore), drop Cf characters, then verify identifier syntax. IOW, you couldn't put the formatting characters around whitespace, keywords, or punctuation. An alternative implementation would be to drop formatting characters everywhere except in string literals. I'll repeat that UTR#39 explicitly discourages support for formatting characters in identifiers. Regards, Martin From rasky at develer.com Fri May 18 00:50:36 2007 From: rasky at develer.com (Giovanni Bajo) Date: Fri, 18 May 2007 00:50:36 +0200 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: On 17/05/2007 18.48, Guido van Rossum wrote: > I have accepted PEP 3131. Do you have a rationale to share with us? Especially given that your previous public mails about the PEP looked mostly against it. This way, the rationale can be embedded in the PEP for future reference. Thanks! -- Giovanni Bajo From guido at python.org Fri May 18 00:56:19 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 17 May 2007 15:56:19 -0700 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: You've missed a few of my mails. I liked the reports from the Java world. On 5/17/07, Giovanni Bajo wrote: > On 17/05/2007 18.48, Guido van Rossum wrote: > > > I have accepted PEP 3131. > > Do you have a rationale to share with us? Especially given that your previous > public mails about the PEP looked mostly against it. This way, the rationale > can be embedded in the PEP for future reference. > > Thanks! > -- > Giovanni Bajo > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rasky at develer.com Fri May 18 01:04:35 2007 From: rasky at develer.com (Giovanni Bajo) Date: Fri, 18 May 2007 01:04:35 +0200 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: References: Message-ID: On 13/05/2007 21.31, Guido van Rossum wrote: > The answer to all of this is the filesystem encoding, which is already > supported. Doesn't appear particularly difficult to me. sys.getfilesystemencoding() is None on most Linux computers I have access to. How is the problem solved there? In fact, I have a question about this. Can anybody show me a valid multi-platform Python code snippet that, given a filename as *unicode* string, create a file with that name, possibly adjusting the name so to ignore an encoding problem (so that the function *always* succeed)? def dump_to_file(unicode_filename): ... I attempted this a couple of times without being satisfied at all by the solutions. -- Giovanni Bajo From guido at python.org Fri May 18 01:10:25 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 17 May 2007 16:10:25 -0700 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: References: Message-ID: On 5/17/07, Giovanni Bajo wrote: > On 13/05/2007 21.31, Guido van Rossum wrote: > > > The answer to all of this is the filesystem encoding, which is already > > supported. Doesn't appear particularly difficult to me. > > sys.getfilesystemencoding() is None on most Linux computers I have access to. > How is the problem solved there? i suppose on such systems filenames are binary strings (except for '/' and '\0') and defaulting to utf8 would work just fine. > In fact, I have a question about this. Can anybody show me a valid > multi-platform Python code snippet that, given a filename as *unicode* string, > create a file with that name, possibly adjusting the name so to ignore an > encoding problem (so that the function *always* succeed)? > > def dump_to_file(unicode_filename): > ... > > > I attempted this a couple of times without being satisfied at all by the > solutions. Why does it have to be cross-platform? The mapping from module names to the filesystem is considered platform specific. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rasky at develer.com Fri May 18 01:14:00 2007 From: rasky at develer.com (Giovanni Bajo) Date: Fri, 18 May 2007 01:14:00 +0200 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: References: Message-ID: On 18/05/2007 1.10, Guido van Rossum wrote: >> In fact, I have a question about this. Can anybody show me a valid >> multi-platform Python code snippet that, given a filename as *unicode* >> string, >> create a file with that name, possibly adjusting the name so to ignore an >> encoding problem (so that the function *always* succeed)? >> >> def dump_to_file(unicode_filename): >> ... >> >> >> I attempted this a couple of times without being satisfied at all by the >> solutions. > > Why does it have to be cross-platform? The mapping from module names > to the filesystem is considered platform specific. With cross-platform, I meant a snippet of code which worked on all platform. The canonicalization of the filename that is produced could of course be different on each plaform. -- Giovanni Bajo From guido at python.org Fri May 18 01:22:11 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 17 May 2007 16:22:11 -0700 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: References: Message-ID: On 5/17/07, Giovanni Bajo wrote: > On 18/05/2007 1.10, Guido van Rossum wrote: > > >> In fact, I have a question about this. Can anybody show me a valid > >> multi-platform Python code snippet that, given a filename as *unicode* > >> string, > >> create a file with that name, possibly adjusting the name so to ignore an > >> encoding problem (so that the function *always* succeed)? > >> > >> def dump_to_file(unicode_filename): > >> ... > >> > >> > >> I attempted this a couple of times without being satisfied at all by the > >> solutions. > > > > Why does it have to be cross-platform? The mapping from module names > > to the filesystem is considered platform specific. > > With cross-platform, I meant a snippet of code which worked on all platform. > The canonicalization of the filename that is produced could of course be > different on each plaform. And I meant what I said. The algorithm is up to the Python implementation on a specific platform. This means that we will have to decide what it will be. Feel free to contribute a suggestion to the PEP author. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rasky at develer.com Fri May 18 01:22:07 2007 From: rasky at develer.com (Giovanni Bajo) Date: Fri, 18 May 2007 01:22:07 +0200 Subject: [Python-3000] pep 3131 again In-Reply-To: <464C1EF4.6040603@v.loewis.de> References: <1d85506f0705161806w19914adfid76e36c4151c336@mail.gmail.com> <464C1EF4.6040603@v.loewis.de> Message-ID: On 17/05/2007 11.23, Martin v. L?wis wrote: > Whether or not Japanese or Chinese people with no knowledge of > English still can master the Latin alphabet easily, I don't know, > as all Chinese people I do know speak German or English well. All Chinese people are taught the Latin-character transliteration of Mandarin in school. It's called "pin-yin": http://en.wikipedia.org/wiki/Pin_yin. In fact, they use this Latin transliteration as the main *mean* to teach children how to pronounce each Chinese character. This transliteration is so common that it is supported as an input method on devices like cellphones or keyboards (even though it is usually not the default: they have more specific and tuned input methods for computers and SMS). So yes, Chinese people do master the Latin alphabet. And funnily enough for this thread, Pin-yin cannot be fully expressed in ASCII because it requires accented vowels (?, ?, and many others I don't have handy). -- Giovanni Bajo From foom at fuhm.net Fri May 18 01:24:21 2007 From: foom at fuhm.net (James Y Knight) Date: Thu, 17 May 2007 19:24:21 -0400 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: References: Message-ID: <8CC6390F-107E-43A2-AD3A-8051BF039B70@fuhm.net> On May 17, 2007, at 7:04 PM, Giovanni Bajo wrote: > On 13/05/2007 21.31, Guido van Rossum wrote: > >> The answer to all of this is the filesystem encoding, which is >> already >> supported. Doesn't appear particularly difficult to me. > > sys.getfilesystemencoding() is None on most Linux computers I have > access to. > How is the problem solved there? > > In fact, I have a question about this. Can anybody show me a valid > multi-platform Python code snippet that, given a filename as > *unicode* string, > create a file with that name, possibly adjusting the name so to > ignore an > encoding problem (so that the function *always* succeed)? > > def dump_to_file(unicode_filename): > ... unicode_filename.encode(sys.getfilesystemencoding() or 'ascii', 'xmlcharrefreplace') would work. Although I don't think I've seen a platform where sys.getfilesystemencoding() is None. If I unset LANG/LANGUAGE/LC_*, python reports 'ANSI_X3.4-1968'. But normally on my system it reports 'UTF-8', since I have LANG=en_US.UTF-8. The *really* tricky thing is that on unix systems, if you want to be able to access all the files on the disk, you have to use the byte- string API, as not all filenames are convertible to unicode. But on windows, if you want to be able to access all the files on the disk, you *CANNOT* use the byte-string api, because not all filenames (which are unicode on disk) are convertible to bytestrings via the "mbcs" encoding (which is what getfilesystemencoding() reports). It's quite a pain in the ass really. James From rasky at develer.com Fri May 18 01:31:03 2007 From: rasky at develer.com (Giovanni Bajo) Date: Fri, 18 May 2007 01:31:03 +0200 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: <8CC6390F-107E-43A2-AD3A-8051BF039B70@fuhm.net> References: <8CC6390F-107E-43A2-AD3A-8051BF039B70@fuhm.net> Message-ID: On 18/05/2007 1.24, James Y Knight wrote: > unicode_filename.encode(sys.getfilesystemencoding() or 'ascii', > 'xmlcharrefreplace') would work. Thanks - using "xmlcharrefreplace" hadn't occurred to me! > The *really* tricky thing is that on unix systems, if you want to be > able to access all the files on the disk, you have to use the byte- > string API, as not all filenames are convertible to unicode. But on > windows, if you want to be able to access all the files on the disk, > you *CANNOT* use the byte-string api, because not all filenames > (which are unicode on disk) are convertible to bytestrings via the > "mbcs" encoding (which is what getfilesystemencoding() reports). It's > quite a pain in the ass really. Yes. I hope that Py3k will solve this somehow. -- Giovanni Bajo From guido at python.org Fri May 18 01:48:57 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 17 May 2007 16:48:57 -0700 Subject: [Python-3000] Radical idea: remove built-in open (require import io) Message-ID: Do people think it would be too radical if the built-in open() function was removed altogether, requiring all code that opens files to import the io module first? This would make it easier to identify modules that engage in I/O. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aleaxit at gmail.com Fri May 18 02:14:49 2007 From: aleaxit at gmail.com (Alex Martelli) Date: Thu, 17 May 2007 17:14:49 -0700 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: On 5/17/07, Guido van Rossum wrote: > Do people think it would be too radical if the built-in open() > function was removed altogether, requiring all code that opens files > to import the io module first? This would make it easier to identify > modules that engage in I/O. I think it would be an excellent idea. Among other advantages, it makes it easier/cleaner to "mock things up" for testing purposes. Right now, if I want to make very small and lightweight unit-tests for a module that uses `open', I have to do that by poking a fake 'open' in the builtins (or in the module under test, but that may be hard to achieve if the module imports other modules which import other modules which...). I do it, but not happily. If all I/O occurred through the io module, I could mock things up in an easier and cleaner way by sticking a "mock io module" in sys.modules['io'] before I import from my unittest the module I'm testing -- very similar to what I do in order to have small lightweight tests of modules that interact with the filesystem with functions such as os.listdir, and the like; I am far more comfortable with this approach than I am with poking into builtins. Alex From shiblon at gmail.com Fri May 18 02:20:12 2007 From: shiblon at gmail.com (Chris Monson) Date: Thu, 17 May 2007 20:20:12 -0400 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: Would other IO builtins also move, like (formerly raw_) input and print? What about the file type? it seems to me that if the rationale is to make use of IO identifiable, then all IO functions would have to move into the io module. What am I missing? - C On 5/17/07, Guido van Rossum wrote: > Do people think it would be too radical if the built-in open() > function was removed altogether, requiring all code that opens files > to import the io module first? This would make it easier to identify > modules that engage in I/O. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/shiblon%40gmail.com > From guido at python.org Fri May 18 02:42:54 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 17 May 2007 17:42:54 -0700 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: On 5/17/07, Chris Monson wrote: > Would other IO builtins also move, like (formerly raw_) input and > print? What about the file type? The file type is already gone in py3k. > it seems to me that if the rationale is to make use of IO > identifiable, then all IO functions would have to move into the io > module. What am I missing? I guess a refinement of the point is that you need the io module to create new I/O streams, while input() and print() act on existing streams. Code that makes read() and write() calls doesn't need to import the io module either, so we're not really making all I/O identifiable, just the open() calls. --Guido > - C > > On 5/17/07, Guido van Rossum wrote: > > Do people think it would be too radical if the built-in open() > > function was removed altogether, requiring all code that opens files > > to import the io module first? This would make it easier to identify > > modules that engage in I/O. > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: > > http://mail.python.org/mailman/options/python-3000/shiblon%40gmail.com > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From shiblon at gmail.com Fri May 18 02:58:49 2007 From: shiblon at gmail.com (Chris Monson) Date: Thu, 17 May 2007 20:58:49 -0400 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: On 5/17/07, Guido van Rossum wrote: > > On 5/17/07, Chris Monson wrote: > > Would other IO builtins also move, like (formerly raw_) input and > > print? What about the file type? > > The file type is already gone in py3k. > > > it seems to me that if the rationale is to make use of IO > > identifiable, then all IO functions would have to move into the io > > module. What am I missing? > > I guess a refinement of the point is that you need the io module to > create new I/O streams, while input() and print() act on existing > streams. Code that makes read() and write() calls doesn't need to > import the io module either, so we're not really making all I/O > identifiable, just the open() calls. Aha. Of course, now that you say all of that, it seems obvious. :-) - C --Guido > > > - C > > > > On 5/17/07, Guido van Rossum wrote: > > > Do people think it would be too radical if the built-in open() > > > function was removed altogether, requiring all code that opens files > > > to import the io module first? This would make it easier to identify > > > modules that engage in I/O. > > > > > > -- > > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > _______________________________________________ > > > Python-3000 mailing list > > > Python-3000 at python.org > > > http://mail.python.org/mailman/listinfo/python-3000 > > > Unsubscribe: > > > http://mail.python.org/mailman/options/python-3000/shiblon%40gmail.com > > > > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070517/d45f58cd/attachment.html From greg.ewing at canterbury.ac.nz Fri May 18 03:21:17 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 18 May 2007 13:21:17 +1200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <464C9D55.9080501@v.loewis.de> References: <464BBE2B.1050201@acm.org> <2A4F5FE3-9F8A-4B74-B46D-B63F1260B7FD@fuhm.net> <464C9D55.9080501@v.loewis.de> Message-ID: <464CFF8D.7040504@canterbury.ac.nz> Martin v. L?wis wrote: > (there might be a need for it in RTL languages, supporting > 200E..200F and 202A..202E, but it seems that speakers of RTL > languages are skeptical about the entire PEP, so it's unclear > whether allowing these would help anything) The ideal kind of programming language for use by both LTR and RTL people would be some kind of RPN. Then the whole program could be read either way as either prefix or postfix. :-) -- Greg From greg.ewing at canterbury.ac.nz Fri May 18 03:41:29 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 18 May 2007 13:41:29 +1200 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: <8CC6390F-107E-43A2-AD3A-8051BF039B70@fuhm.net> References: <8CC6390F-107E-43A2-AD3A-8051BF039B70@fuhm.net> Message-ID: <464D0449.9020406@canterbury.ac.nz> James Y Knight wrote: > The *really* tricky thing is that on unix systems, if you want to be > able to access all the files on the disk, you have to use the byte- > string API ... But on windows ... you *CANNOT* use the byte-string api How are we going to cope with this in Py3k with unicode-only strings? -- Greg From shiblon at gmail.com Fri May 18 04:09:52 2007 From: shiblon at gmail.com (Chris Monson) Date: Thu, 17 May 2007 22:09:52 -0400 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <464CFF8D.7040504@canterbury.ac.nz> References: <464BBE2B.1050201@acm.org> <2A4F5FE3-9F8A-4B74-B46D-B63F1260B7FD@fuhm.net> <464C9D55.9080501@v.loewis.de> <464CFF8D.7040504@canterbury.ac.nz> Message-ID: Ignoring for a moment that prefix != reverse(postfix), that is.... :-) - C On 5/17/07, Greg Ewing wrote: > Martin v. L?wis wrote: > > > (there might be a need for it in RTL languages, supporting > > 200E..200F and 202A..202E, but it seems that speakers of RTL > > languages are skeptical about the entire PEP, so it's unclear > > whether allowing these would help anything) > > The ideal kind of programming language for use by both > LTR and RTL people would be some kind of RPN. Then the > whole program could be read either way as either prefix > or postfix. :-) > > -- > Greg > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/shiblon%40gmail.com > From guido at python.org Fri May 18 04:35:23 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 17 May 2007 19:35:23 -0700 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: <464D0449.9020406@canterbury.ac.nz> References: <8CC6390F-107E-43A2-AD3A-8051BF039B70@fuhm.net> <464D0449.9020406@canterbury.ac.nz> Message-ID: On 5/17/07, Greg Ewing wrote: > James Y Knight wrote: > > The *really* tricky thing is that on unix systems, if you want to be > > able to access all the files on the disk, you have to use the byte- > > string API ... But on windows ... you *CANNOT* use the byte-string api > > How are we going to cope with this in Py3k with > unicode-only strings? Not any different that we do now -- you can already pass both types of strings to a Windows API and we convert it to the kind of string the API needs. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at pythoncraft.com Fri May 18 04:56:52 2007 From: aahz at pythoncraft.com (Aahz) Date: Thu, 17 May 2007 19:56:52 -0700 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: <20070518025651.GA7643@panix.com> On Thu, May 17, 2007, Guido van Rossum wrote: > > Do people think it would be too radical if the built-in open() > function was removed altogether, requiring all code that opens files > to import the io module first? This would make it easier to identify > modules that engage in I/O. My initial take was -1, but now that I see that the existing tutorial introduces modules before it discusses files, I'm only -0. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet From brett at python.org Fri May 18 05:53:20 2007 From: brett at python.org (Brett Cannon) Date: Thu, 17 May 2007 20:53:20 -0700 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: On 5/17/07, Guido van Rossum wrote: > > Do people think it would be too radical if the built-in open() > function was removed altogether, requiring all code that opens files > to import the io module first? This would make it easier to identify > modules that engage in I/O. I support it. My security work wanted open and execfile yanked out of the built-in namespace anyway. =) -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070517/975a52f6/attachment.htm From martin at v.loewis.de Fri May 18 07:26:09 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 18 May 2007 07:26:09 +0200 Subject: [Python-3000] Unicode strings, identifiers, and import In-Reply-To: References: Message-ID: <464D38F1.5030808@v.loewis.de> >> The answer to all of this is the filesystem encoding, which is already >> supported. Doesn't appear particularly difficult to me. > > sys.getfilesystemencoding() is None on most Linux computers I have access to. That's strange. Is LANG not set? > How is the problem solved there? A default needs to be applied. In 2.x, the default is the system encoding. Not sure whether the notion of a Python system encoding will be preserved for 3.x, but it should be safe, on Unix, to default to UTF-8 for the file system encoding unless LANG specifies something different. > In fact, I have a question about this. Can anybody show me a valid > multi-platform Python code snippet that, given a filename as *unicode* string, > create a file with that name, possibly adjusting the name so to ignore an > encoding problem (so that the function *always* succeed)? That's not really a python-dev or py3k question. If you want to support *arbitrary* Unicode strings, you clearly cannot map them to file names directly: what if the Unicode string contains the directory separator, or other characters not allowed in file names (such as : or * on Windows). If you need to guarantee that any Unicode string can map to a file name, I suggest f = open(filename.encode("utf-8").encode("hex"), "w") > I attempted this a couple of times without being satisfied at all by the > solutions. That's probably because you failed to specify all requirements that you need for satisfaction. If you would explicitly specify them, you would likely find that they conflict, and that no solution can possibly exist satisfying all your requirements, and that this has nothing to do with Unicode. Notice that my above solution meets the *specified* needs: it supports all unicode strings, succeeds always, and possibly adjusts the file name to ignore an encoding problem. Of course, interpreting the file name in a file explorer is somewhat tedious... Regards, Martin From jimjjewett at gmail.com Fri May 18 17:24:19 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 18 May 2007 11:24:19 -0400 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <464C732B.8050103@v.loewis.de> References: <464BBE2B.1050201@acm.org> <464C732B.8050103@v.loewis.de> Message-ID: On 5/17/07, "Martin v. L?wis" wrote: > > is it generally agreed that the > > Unicode character classes listed in the PEP are the ones we want to > > include in identifiers? My preference is to be conservative in terms of > > what's allowed. > John Nagle suggested to consider UTR#39 > (http://unicode.org/reports/tr39/). I encourage anybody to help me > understand what it says. > The easiest part is 3.1: this seems to say we should restrict characters > listed as "restrict" in [idmod]. My suggestion would be to warn about > them. I'm not sure about the purpose of the additional characters: > surely, they don't think we should support HYPHEN-MINUS in identifiers? Rather, they mean that it is commonly used (Lisp and DNS names, at least), and is (deemed by them as) safe (given that you have applied their exclusions, such as the dashes). Python should still use a tailoring and exclude it. > 4. Confusable Detection: Without considering details, it seems you need > two strings to decide whether they are confusable. So it's not clear > to me how this could apply to banning certain identifiers. In most cases, the strings are confusable because individual characters are. TR 39 makes it sound more complicated than it need to be, because they want to permit all sorts of strangeness, so long as it is at least unambiguous strangeness. My take: Single-script confusables are things like "1" vs "l", and it is probably too late to fight them. Whole-script confusables are cases where two scripts look alike; you can get something looking like "scope" in either Latin or Cyrillic. If we're going to allow non-Latin identifiers, then we'll probably have to live with this. Mixed-script confusables are spoofing that wouldn't work if you insisted that any single identifier stick to a "single" script. ('p?yp?l', with Cyrillic '?'s). Their algorithm talks about entire strings because they want to allow 'toys-?-us'. Technically, Latin doesn't have a character that looks like a backwards-R, and Cyrillic doesn't have matches for *all of* "toys us". Personally, I don't see a strong need to support toys_?_us just because it would be possible. On the other hand, I'm not sure how often users of non-latin languages will want to mix in latin letters. The tech report suggested that it is fairly common to use all of (Hiragana | Katakana | Han | Latin) in Japanese text, but I'm not sure whether it would be normal to mix them within a single identifier. > 5. Mixed Script Detection: That might apply, but I can't map the > algorithm to terminology I'm familiar with. What is UScript.COMMON > and UScript.INHERITED? Those are characters used in many different languages, such as the From TR 24 http://www.unicode.org/reports/tr24/ Inherited?for characters that may be used with multiple scripts, and inherit their script from the preceding characters. Includes nonspacing marks, enclosing marks, and the zero width joiner/non-joiner characters. Common?for other characters that may be used with multiple scripts. > I'm skeptical about mixed-script detection, > because you surely want to allow ASCII digits (0..9) in Cyrillic According to http://www.unicode.org/Public/UNIDATA/Scripts.txt, the 52 letters [A-Za-z] are latin, but the rest of ASCII (including digits) is COMMON, and should be allowed with any script. -jJ From ark-mlist at att.net Fri May 18 17:51:58 2007 From: ark-mlist at att.net (Andrew Koenig) Date: Fri, 18 May 2007 11:51:58 -0400 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: <002c01c79964$77a6fc00$66f4f400$@net> > Do people think it would be too radical if the built-in open() > function was removed altogether, requiring all code that opens files > to import the io module first? This would make it easier to identify > modules that engage in I/O. +1. Presumably you can still write to the standard input, output, error, and log files without importing io. (I'm feeling slightly pedantic today, so I want to say that the proposal doesn't make it any easier to identify modules that engage in I/O -- it makes it easier to identify modules that assuredly do not engage in I/O. +1 anyway.) From rrr at ronadam.com Fri May 18 18:17:53 2007 From: rrr at ronadam.com (Ron Adam) Date: Fri, 18 May 2007 11:17:53 -0500 Subject: [Python-3000] Raw strings containing \u or \U In-Reply-To: References: <464B62FD.4070400@ronadam.com> <464B7235.20500@ronadam.com> Message-ID: <464DD1B1.1080009@ronadam.com> Georg Brandl wrote: > Ron Adam schrieb: >> Guido van Rossum wrote: >>> That would be great! This will automatically turn \u1234 into 6 >>> characters, right? >> I'm not exactly clear when the '\uxxxx' characters get converted. There >> isn't any conversion done in tokanize.c that I can see. It's primarily >> only concerned with finding the beginning and ending of the string at that >> point. It looks like everything between the beginning and end is just >> passed along "as is" and it's translated further later in the chain. > > Look at Python/ast.c, which has functions parsestr() and decode_unicode(). > The latter calls PyUnicode_DecodeRawUnicodeEscape() which I think is the > function you're looking for. > > Georg Thanks, I'll look there. That should be where I need to look to fix a glitch where the last quote of a raw string is both the end of the string and part of a string. >>> r'\' "\\'" Interestingly it works just fine for raw byte strings. (I wish the letter were reversed, saying bytes-raw-string is awkward.) >>> br'\' b'\\' Anyway, I've made the corresponding modifications to tokenize.py and tokenize_tests.txt. The tests for tokenize.py need to be updated. They do a round trip test, but I've found that doesn't mean it's the correct round trip! Cheers, Ron From fumanchu at amor.org Fri May 18 18:35:30 2007 From: fumanchu at amor.org (Robert Brewer) Date: Fri, 18 May 2007 09:35:30 -0700 Subject: [Python-3000] Radical idea: remove built-in open (requireimport io) In-Reply-To: <002c01c79964$77a6fc00$66f4f400$@net> Message-ID: <435DF58A933BA74397B42CDEB8145A860C23930D@ex9.hostedexchange.local> Guido van Rossum wrote: > Do people think it would be too radical if the built-in open() > function was removed altogether, requiring all code that opens files > to import the io module first? This would make it easier to identify > modules that engage in I/O. I must be dense, because I don't see how the proposal "makes it easier to identify modules that engage in I/O". Who's supposed to be doing the identification and when? And how will it not be fooled by __import__ and plain 'ol cross-module references? Robert Brewer System Architect Amor Ministries fumanchu at amor.org From guido at python.org Fri May 18 18:44:54 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 18 May 2007 09:44:54 -0700 Subject: [Python-3000] Radical idea: remove built-in open (requireimport io) In-Reply-To: <435DF58A933BA74397B42CDEB8145A860C23930D@ex9.hostedexchange.local> References: <002c01c79964$77a6fc00$66f4f400$@net> <435DF58A933BA74397B42CDEB8145A860C23930D@ex9.hostedexchange.local> Message-ID: On 5/18/07, Robert Brewer wrote: > Guido van Rossum wrote: > > Do people think it would be too radical if the built-in open() > > function was removed altogether, requiring all code that opens files > > to import the io module first? This would make it easier to identify > > modules that engage in I/O. > > I must be dense, because I don't see how the proposal "makes it easier > to identify modules that engage in I/O". Who's supposed to be doing the > identification and when? And how will it not be fooled by __import__ and > plain 'ol cross-module references? I wasn't thinking of this from a security POV -- more from the perspective of trying to understand roughly what a module does. Looking at the imports is often a good place to start. If you see it importing socket, that's kind of a hint that it might need the network. If you see it importing io or os, that would be a similar hint that it might access the filesystem. Of course, if you see it import some other module you will have to understand what that module does (or put it on your stack for later), and so on. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Fri May 18 18:54:05 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Fri, 18 May 2007 10:54:05 -0600 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: Guido van Rossum wrote: > Do people think it would be too radical if the built-in open() > function was removed altogether, requiring all code that opens files > to import the io module first? This would make it easier to identify > modules that engage in I/O. [and later] > I guess a refinement of the point is that you need the io module to > create new I/O streams, while input() and print() act on existing > streams. Code that makes read() and write() calls doesn't need to > import the io module either, so we're not really making all I/O > identifiable, just the open() calls. +0.5. I'm all for keeping the builtins as simple as possible. And if you're already used to importing io for file, when you discover you need to do something more complicated involving other layers of the io stack, you'll already be looking in the right place. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From collinw at gmail.com Fri May 18 19:02:35 2007 From: collinw at gmail.com (Collin Winter) Date: Fri, 18 May 2007 10:02:35 -0700 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: <43aa6ff70705181002u779a6ae3w9a4ea050601cde70@mail.gmail.com> On 5/17/07, Guido van Rossum wrote: > Do people think it would be too radical if the built-in open() > function was removed altogether, requiring all code that opens files > to import the io module first? This would make it easier to identify > modules that engage in I/O. +1 Thinking out loud: I wonder if the io module should also become the canonical source for stdin, stdout, stderr instead of sys. Collin Winter From guido at python.org Fri May 18 20:02:56 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 18 May 2007 11:02:56 -0700 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? Message-ID: While reviewing PEPs, I stumbled over PEP 335 ( Overloadable Boolean Operators) by Greg Ewing. I am of two minds of this -- on the one hand, it's been a long time without any working code or anything. OTOH it might be quite useful to e.g. numpy folks. It is time to reject it due to lack of interest, or revive it! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From baptiste13 at altern.org Fri May 18 20:05:52 2007 From: baptiste13 at altern.org (Baptiste Carvello) Date: Fri, 18 May 2007 20:05:52 +0200 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: Guido van Rossum a ?crit : > Do people think it would be too radical if the built-in open() > function was removed altogether, requiring all code that opens files > to import the io module first? This would make it easier to identify > modules that engage in I/O. > -1 Will someone think of the interactive users ? From guido at python.org Fri May 18 20:10:13 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 18 May 2007 11:10:13 -0700 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: On 5/18/07, Baptiste Carvello wrote: > Guido van Rossum a ?crit : > > Do people think it would be too radical if the built-in open() > > function was removed altogether, requiring all code that opens files > > to import the io module first? This would make it easier to identify > > modules that engage in I/O. > > -1 > > Will someone think of the interactive users ? What kind of interactive use are you making of open()? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From g.brandl at gmx.net Fri May 18 20:21:43 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 18 May 2007 20:21:43 +0200 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: Baptiste Carvello schrieb: > Guido van Rossum a ?crit : >> Do people think it would be too radical if the built-in open() >> function was removed altogether, requiring all code that opens files >> to import the io module first? This would make it easier to identify >> modules that engage in I/O. >> > > -1 > > Will someone think of the interactive users ? They can still put "import sys, os, io" in their PYTHONSTARTUP file. Or use IPython. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From python at rcn.com Fri May 18 20:55:44 2007 From: python at rcn.com (Raymond Hettinger) Date: Fri, 18 May 2007 14:55:44 -0400 (EDT) Subject: [Python-3000] Radical idea: remove built-in open (requireimport io) Message-ID: <20070518145544.BJT29928@ms09.lnh.mail.rcn.net> > I wasn't thinking of this from a security POV -- more from the > perspective of trying to understand roughly what a module does. > Looking at the imports is often a good place to start. In the case of open(), this may be a false benefit. Too many other calls (logging, shelve, etc) can open files, so the presence or absence of an IO import is not a reliable indicator of anything. Also, the character of a script doesn't change when it decides to switch from stdin/stdout to actual files. I don't think we gain anything here and are instead adding a small irritant. The open() function is so basic, it should remain a builtin. In theory, all builtins could be moved to other modules, but in practice it would be a PITA for day-to-day script writing. I enjoy being able to dash off quick, expressive lines like this: for i, line in enumerate(open('data.txt')): ... Needing an import for that frequently used function would detract from the enjoyment. Taking a more global viewpoint, I'm experiencing a little FUD about Py3k. There were good reasons for introducing the print() function, but then we've made "hello world" a little less lightweight. Larger applications have some legitimate needs which make abstract base classes attractive, but they are going to add significantly to the learning curve for beginners when using APIs that require them. Packages were introduced to address the needs of large applications and complicated namespace issues, but now we're about to split the trivially simple string module into a package. One of the design goals should be to keep the core language as trivially simple/straightforward as possible for day-to-day use. Requiring an import for file opening runs contrary to that goal. Raymond From jdahlin at async.com.br Fri May 18 21:49:11 2007 From: jdahlin at async.com.br (Johan Dahlin) Date: Fri, 18 May 2007 16:49:11 -0300 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: Message-ID: <464E0337.3000905@async.com.br> Guido van Rossum wrote: > While reviewing PEPs, I stumbled over PEP 335 ( Overloadable Boolean > Operators) by Greg Ewing. I am of two minds of this -- on the one > hand, it's been a long time without any working code or anything. OTOH > it might be quite useful to e.g. numpy folks. This kind of feature would also be useful for ORMs, to be able to map boolean operators to SQL. Johan From greg.ewing at canterbury.ac.nz Sat May 19 01:57:36 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 19 May 2007 11:57:36 +1200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: References: <464BBE2B.1050201@acm.org> <2A4F5FE3-9F8A-4B74-B46D-B63F1260B7FD@fuhm.net> <464C9D55.9080501@v.loewis.de> <464CFF8D.7040504@canterbury.ac.nz> Message-ID: <464E3D70.8050408@canterbury.ac.nz> Chris Monson wrote: > Ignoring for a moment that prefix != reverse(postfix), that is.... It is if you don't insist on putting silly parentheses all over the place. (IOW, "prefix" is not synonymous with "Lisp".) -- Greg From shiblon at gmail.com Sat May 19 02:29:13 2007 From: shiblon at gmail.com (Chris Monson) Date: Fri, 18 May 2007 20:29:13 -0400 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <464E3D70.8050408@canterbury.ac.nz> References: <464BBE2B.1050201@acm.org> <2A4F5FE3-9F8A-4B74-B46D-B63F1260B7FD@fuhm.net> <464C9D55.9080501@v.loewis.de> <464CFF8D.7040504@canterbury.ac.nz> <464E3D70.8050408@canterbury.ac.nz> Message-ID: So / 4 2 = 2 4 / ? I beg to differ :-). At any rate, - C On 5/18/07, Greg Ewing wrote: > Chris Monson wrote: > > Ignoring for a moment that prefix != reverse(postfix), that is.... > > It is if you don't insist on putting silly > parentheses all over the place. (IOW, "prefix" > is not synonymous with "Lisp".) > > -- > Greg > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/shiblon%40gmail.com > From greg.ewing at canterbury.ac.nz Sat May 19 02:47:29 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 19 May 2007 12:47:29 +1200 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: Message-ID: <464E4921.9040101@canterbury.ac.nz> Guido van Rossum wrote: > While reviewing PEPs, I stumbled over PEP 335 ( Overloadable Boolean > Operators) by Greg Ewing. > > It is time to reject it due to lack of interest, or revive it! Didn't you post something about this a short time ago, suggesting you were in favour of it? If you need an up-to-date implementation before it can be accepted, let me know and I'll see what I can do. I wouldn't want it to be rejected just because of that. -- Greg From greg.ewing at canterbury.ac.nz Sat May 19 03:12:19 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 19 May 2007 13:12:19 +1200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: References: <464BBE2B.1050201@acm.org> <2A4F5FE3-9F8A-4B74-B46D-B63F1260B7FD@fuhm.net> <464C9D55.9080501@v.loewis.de> <464CFF8D.7040504@canterbury.ac.nz> <464E3D70.8050408@canterbury.ac.nz> Message-ID: <464E4EF3.8090107@canterbury.ac.nz> Chris Monson wrote: > So / 4 2 = 2 4 / ? It would be unusual, but there's nothing to prevent / from being defined that way in the postfix version of the language. -- Greg From guido at python.org Sat May 19 03:21:45 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 18 May 2007 18:21:45 -0700 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: <464E4921.9040101@canterbury.ac.nz> References: <464E4921.9040101@canterbury.ac.nz> Message-ID: On 5/18/07, Greg Ewing wrote: > Guido van Rossum wrote: > > While reviewing PEPs, I stumbled over PEP 335 ( Overloadable Boolean > > Operators) by Greg Ewing. > > > > It is time to reject it due to lack of interest, or revive it! > > Didn't you post something about this a short time ago, > suggesting you were in favour of it? I think I did, but I hope I'm not the only one in favor. > If you need an up-to-date implementation before it can > be accepted, let me know and I'll see what I can do. > I wouldn't want it to be rejected just because of that. Working implementations are good for all sorts of reasons. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rasky at develer.com Sat May 19 13:24:55 2007 From: rasky at develer.com (Giovanni Bajo) Date: Sat, 19 May 2007 13:24:55 +0200 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: <464E4921.9040101@canterbury.ac.nz> Message-ID: On 19/05/2007 3.21, Guido van Rossum wrote: >>> While reviewing PEPs, I stumbled over PEP 335 ( Overloadable Boolean >>> Operators) by Greg Ewing. >>> >>> It is time to reject it due to lack of interest, or revive it! >> Didn't you post something about this a short time ago, >> suggesting you were in favour of it? > > I think I did, but I hope I'm not the only one in favor. I'm -0 on the idea, they're very rarely overloaded in C++ as well, since there are only few really valid use cases. In fact, the only example I saw till now where those of constructing meta-languages using Python's syntax, which is something that Python has never really encouraged (see the metaprogramming syntax which is now officially vetoed). But I'm not -1 because I assume that (just like unicode identifiers) they will not be abused by the community, and they probably do help some very rare and uncommon use cases where they are really required. -- Giovanni Bajo From ncoghlan at gmail.com Sat May 19 13:54:23 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 19 May 2007 21:54:23 +1000 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: References: <464BBE2B.1050201@acm.org> <464C732B.8050103@v.loewis.de> Message-ID: <464EE56F.5040300@gmail.com> Jim Jewett wrote: > On the other hand, I'm not sure how often users of non-latin languages > will want to mix in latin letters. The tech report suggested that it > is fairly common to use all of (Hiragana | Katakana | Han | Latin) in > Japanese text, but I'm not sure whether it would be normal to mix them > within a single identifier. Mixing Kanji (Han script) & Hiragana in a single Japanese word is certainly quite common (main part of the word in kanji, the ending in hiragana). I can't think of any cases where the other two would be mixed (with each other or with either of the first two scripts) within a single word, but my Japanese is pretty poor - there could easily be cases I'm not aware of. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From jason.orendorff at gmail.com Sat May 19 14:41:32 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Sat, 19 May 2007 08:41:32 -0400 Subject: [Python-3000] [Python-Dev] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: Message-ID: On 5/18/07, Guido van Rossum wrote: > While reviewing PEPs, I stumbled over PEP 335 ( Overloadable Boolean > Operators) by Greg Ewing. -1. "and" and "or" affect the flow of control. It's a matter of taste, but I feel the benefit is too small here to add another flow-control quirk. I like that part of the language to be simple. Anyway, if this *is* done, logically it should cover "(... if ... else ...)" as well. Same use cases. -j From ntoronto at cs.byu.edu Sat May 19 19:12:28 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Sat, 19 May 2007 11:12:28 -0600 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: <464E4921.9040101@canterbury.ac.nz> Message-ID: <464F2FFC.3060902@cs.byu.edu> Giovanni Bajo wrote: > On 19/05/2007 3.21, Guido van Rossum wrote: > >>>> While reviewing PEPs, I stumbled over PEP 335 ( Overloadable Boolean >>>> Operators) by Greg Ewing. >>>> >>>> It is time to reject it due to lack of interest, or revive it! >>>> >>> Didn't you post something about this a short time ago, >>> suggesting you were in favour of it? >>> >> I think I did, but I hope I'm not the only one in favor. >> > > I'm -0 on the idea, they're very rarely overloaded in C++ as well, since there > are only few really valid use cases. > > In fact, the only example I saw till now where those of constructing > meta-languages using Python's syntax, which is something that Python has never > really encouraged (see the metaprogramming syntax which is now officially vetoed). > > But I'm not -1 because I assume that (just like unicode identifiers) they will > not be abused by the community, and they probably do help some very rare and > uncommon use cases where they are really required. > There's a fairly common one, actually, that comes up quite a lot in Numpy. Currently, best practice is a wart. Here's some code of mine for evaluating log probabilities from the Multinomial family: class Multinomial(DistFamily): @classmethod def logProb(cls, x, n, p): x = scipy.asarray(x) n = scipy.asarray(n) p = scipy.asarray(p) result = special.gammaln(n + 1) - special.gammaln(x + 1).sum(-1) + (x * scipy.log(p)).sum(-1) xsum = x.sum(-1) psum = p.sum(-1) return scipy.where((xsum != n) | (psum < 0.99999) | (psum > 1.00001) | ~scipy.isfinite(result), -scipy.inf, result) That last bit is really confusing to new Numpy users, especially figuring out how to do it in the first place. (Once you get it, it's not *so* bad.) The parenthesis are required, by the way. With overloadable booleans, it would become much more readable and newbie-friendly: return scipy.where(xsum != n or psum < 0.99999 or psum > 1.00001 or not scipy.isfinite(result), -scipy.inf, result) This isn't just an issue with "where" though - boolean arrays come up quite a bit elsewhere, especially in indexing (you can index an array with an array of booleans) and counting. Given that we're supposed to see tighter integration with Numpy, I'd say this family of use cases is fairly significant. Neil From rasky at develer.com Sat May 19 21:21:58 2007 From: rasky at develer.com (Giovanni Bajo) Date: Sat, 19 May 2007 21:21:58 +0200 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: <464F2FFC.3060902@cs.byu.edu> References: <464E4921.9040101@canterbury.ac.nz> <464F2FFC.3060902@cs.byu.edu> Message-ID: On 19/05/2007 19.12, Neil Toronto wrote: > There's a fairly common one, actually, that comes up quite a lot in > Numpy. Currently, best practice is a wart. Here's some code of mine for > evaluating log probabilities from the Multinomial family: > > class Multinomial(DistFamily): > @classmethod > def logProb(cls, x, n, p): > x = scipy.asarray(x) > n = scipy.asarray(n) > p = scipy.asarray(p) > result = special.gammaln(n + 1) - special.gammaln(x + > 1).sum(-1) + (x * scipy.log(p)).sum(-1) > xsum = x.sum(-1) > psum = p.sum(-1) > return scipy.where((xsum != n) | (psum < 0.99999) | (psum > > 1.00001) | ~scipy.isfinite(result), -scipy.inf, result) > > > That last bit is really confusing to new Numpy users, especially > figuring out how to do it in the first place. (Once you get it, it's not > *so* bad.) The parenthesis are required, by the way. With overloadable > booleans, it would become much more readable and newbie-friendly: > > return scipy.where(xsum != n or psum < 0.99999 or psum > > 1.00001 or not scipy.isfinite(result), -scipy.inf, result) Probably it's better in the numpy contest, but surely it's a little confusing at first sight for a non-numpy savvy. In fact, as you said, I don't think the current best-practice is *that* bad after all. I'll keep my -0. ======================== Now for the fun side :) Another workaround could be: return scipy.where( "xsum != n or psum < 0.99999 or " "psum > 1.000001 or not scipy.isfinite(result)", -scipy.inf, result) with the necessary magic to pull out variables from the stack frame. Parsing could be done only once of course. But I'm sure the numpy guys have already thought and discarded this solution as it's more complicated. [[ In fact, numpy is actually trying to create a DSL with Python itself. I assume things like "x.sum(-1)" would have been probably spelled sum(x, -1), if you could freely decide what to do without worrying about the implementation. ]] Or, another workaround is something like this: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/384122, which could probably be extended to more "operators" that numpy can't simulate using the plain Python syntax. -- Giovanni Bajo From robert.kern at gmail.com Sat May 19 21:45:58 2007 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 19 May 2007 14:45:58 -0500 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: <464E4921.9040101@canterbury.ac.nz> <464F2FFC.3060902@cs.byu.edu> Message-ID: Giovanni Bajo wrote: > Another workaround could be: > > return scipy.where( > "xsum != n or psum < 0.99999 or " > "psum > 1.000001 or not scipy.isfinite(result)", > -scipy.inf, result) > > with the necessary magic to pull out variables from the stack frame. Parsing > could be done only once of course. But I'm sure the numpy guys have already > thought and discarded this solution as it's more complicated. Well, it doesn't actually solve the problem. Yes, we could write functions that parse some language that looks like Python but executes as something else, but that doesn't advance us towards the goal of making the code easier to understand. > [[ In fact, numpy is actually trying to create a DSL with Python itself. It isn't. At least, not any more than any other custom type is. > I > assume things like "x.sum(-1)" would have been probably spelled sum(x, -1), if > you could freely decide what to do without worrying about the implementation. ]] In fact, it can be spelled so and once could only be spelled so in Numeric. > Or, another workaround is something like this: > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/384122, which could > probably be extended to more "operators" that numpy can't simulate using the > plain Python syntax. Much as we'd like it to be, it's just not practical. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From python at rcn.com Sat May 19 23:37:23 2007 From: python at rcn.com (Raymond Hettinger) Date: Sat, 19 May 2007 14:37:23 -0700 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? References: <464E4921.9040101@canterbury.ac.nz> <464F2FFC.3060902@cs.byu.edu> Message-ID: <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> > Giovanni Bajo wrote: >> Another workaround could be: Before focusing mental talents on workarounds and implementations, it would be worthwhile to consider whether the idea would help or hurt the language. The and/or keywords already have some complexity due to their returning non-boolean values. IMO, it would be a disservice to the language to further complexify their meanings. Right now, at least, we can make a static reading of the code and have a good idea of what the and/or keywords mean. Someone once proposed overloadable behavior for the "is" operator. IMO, the reasons for rejecting that idea also apply to this proposal. FWIW, the peephole optimizer takes advantage of the current meaning of and/or to generate faster code. It would be ashamed to lose this optimization and have all applications pay a price in slower code. Raymond From bob at redivi.com Sun May 20 00:01:45 2007 From: bob at redivi.com (Bob Ippolito) Date: Sat, 19 May 2007 15:01:45 -0700 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> References: <464E4921.9040101@canterbury.ac.nz> <464F2FFC.3060902@cs.byu.edu> <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> Message-ID: <6a36e7290705191501m3ea09731u2329c39473301e58@mail.gmail.com> On 5/19/07, Raymond Hettinger wrote: > > Giovanni Bajo wrote: > >> Another workaround could be: > > Before focusing mental talents on workarounds and implementations, > it would be worthwhile to consider whether the idea would help or > hurt the language. The and/or keywords already have some complexity > due to their returning non-boolean values. IMO, it would be a disservice > to the language to further complexify their meanings. Right now, at least, > we can make a static reading of the code and have a good idea of what > the and/or keywords mean. Would "and" and "or" still be able to properly short-circuit given this proposal? -bob From robert.kern at gmail.com Sun May 20 02:19:24 2007 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 19 May 2007 19:19:24 -0500 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> References: <464E4921.9040101@canterbury.ac.nz> <464F2FFC.3060902@cs.byu.edu> <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> Message-ID: Raymond Hettinger wrote: >> Giovanni Bajo wrote: >>> Another workaround could be: > > Before focusing mental talents on workarounds and implementations, > it would be worthwhile to consider whether the idea would help or > hurt the language. The and/or keywords already have some complexity > due to their returning non-boolean values. IMO, it would be a disservice > to the language to further complexify their meanings. Right now, at least, > we can make a static reading of the code and have a good idea of what > the and/or keywords mean. It would probably hurt the language, and for the record, I'm against it. We already have problems with rich comparisons not reliably returning booleans. It's a fairly common occurrence to do equality testing against generic data types. For example, finding if an object is in a list with list.index(). However, this does not reliably work when == can return something that is not interpretable as a boolean value like numpy arrays do. I don't think rich comparisons are a mistake (I use them much more frequently than I use list.index(), for example), but propagating the uncertainty further is probably a mistake. For numpy, the bitwise operators |&~ work fine on boolean arrays, and that's all such operators really need to work on. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From greg.ewing at canterbury.ac.nz Sun May 20 02:28:04 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 20 May 2007 12:28:04 +1200 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> References: <464E4921.9040101@canterbury.ac.nz> <464F2FFC.3060902@cs.byu.edu> <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> Message-ID: <464F9614.1010208@canterbury.ac.nz> Raymond Hettinger wrote: > Someone once proposed overloadable behavior for the "is" operator. > IMO, the reasons for rejecting that idea also apply to this proposal. The reason for rejecting that is that it would leave us with no way of reliably testing whether two references point to the same object. That objection doesn't apply here, because there would still be a way of ensuring that you get boolean semantics if it matters for some reason: bool(a) and bool(b), etc. -- Greg From greg.ewing at canterbury.ac.nz Sun May 20 02:32:58 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 20 May 2007 12:32:58 +1200 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> References: <464E4921.9040101@canterbury.ac.nz> <464F2FFC.3060902@cs.byu.edu> <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> Message-ID: <464F973A.8070205@canterbury.ac.nz> Raymond Hettinger wrote: > FWIW, the peephole optimizer takes advantage of the current meaning > of and/or to generate faster code. Can you give some examples of the sort of optimisations that are done? It may still be possible to do them -- the AND1 and OR1 bytecodes in my proposal are conditional branch instructions, much like the existing boolean operator bytecodes. -- Greg From greg.ewing at canterbury.ac.nz Sun May 20 02:34:51 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 20 May 2007 12:34:51 +1200 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: <6a36e7290705191501m3ea09731u2329c39473301e58@mail.gmail.com> References: <464E4921.9040101@canterbury.ac.nz> <464F2FFC.3060902@cs.byu.edu> <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> <6a36e7290705191501m3ea09731u2329c39473301e58@mail.gmail.com> Message-ID: <464F97AB.2040808@canterbury.ac.nz> Bob Ippolito wrote: > Would "and" and "or" still be able to properly short-circuit given > this proposal? Yes. I was very careful to ensure that all the existing semantics are preserved in the case of no overloads, and also that overloads can mimic all of the existing semantics if they need to. -- Greg From python at rcn.com Sun May 20 00:13:10 2007 From: python at rcn.com (Raymond Hettinger) Date: Sat, 19 May 2007 15:13:10 -0700 Subject: [Python-3000] Radical idea: remove built-in open (requireimport io) References: <002c01c79964$77a6fc00$66f4f400$@net> Message-ID: <003701c79a87$a7db5460$f101a8c0@RaymondLaptop1> From: "Andrew Koenig" > (I'm feeling slightly pedantic today, so I want to say that the proposal > doesn't make it any easier to identify modules that engage in I/O -- it > makes it easier to identify modules that assuredly do not engage in I/O. u = urllib.urlopen('http://www.python.org') s = shelve.open('persistantmap.shl') logging.basicConfig('events.log') Raymond From jason.orendorff at gmail.com Sun May 20 05:46:22 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Sat, 19 May 2007 23:46:22 -0400 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: <464F973A.8070205@canterbury.ac.nz> References: <464E4921.9040101@canterbury.ac.nz> <464F2FFC.3060902@cs.byu.edu> <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> <464F973A.8070205@canterbury.ac.nz> Message-ID: On 5/19/07, Greg Ewing wrote: > Raymond Hettinger wrote: > > FWIW, the peephole optimizer takes advantage of the current meaning > > of and/or to generate faster code. > > Can you give some examples of the sort of optimisations > that are done? Look in Python/peephole.c, function PyCode_Optimize(). Search for "case JUMP_IF_FALSE". There's a nice comment immediately preceding that line. -j From greg.ewing at canterbury.ac.nz Sun May 20 06:57:55 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 20 May 2007 16:57:55 +1200 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: <464E4921.9040101@canterbury.ac.nz> <464F2FFC.3060902@cs.byu.edu> <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> <464F973A.8070205@canterbury.ac.nz> Message-ID: <464FD553.1090000@canterbury.ac.nz> Jason Orendorff wrote: > Look in Python/peephole.c, Which version of Python is this in? I can't find a file by that name anywhere in my 2.3, 2.4.3 or 2.5 sources. -- Greg From steven.bethard at gmail.com Sun May 20 07:05:01 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sat, 19 May 2007 23:05:01 -0600 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: <464FD553.1090000@canterbury.ac.nz> References: <464F2FFC.3060902@cs.byu.edu> <025401c79a5d$e3b4a010$f101a8c0@RaymondLaptop1> <464F973A.8070205@canterbury.ac.nz> <464FD553.1090000@canterbury.ac.nz> Message-ID: On 5/19/07, Greg Ewing wrote: > Jason Orendorff wrote: > > Look in Python/peephole.c, > > Which version of Python is this in? I can't find a file by > that name anywhere in my 2.3, 2.4.3 or 2.5 sources. http://svn.python.org/view/python/trunk/Python/peephole.c?rev=54086&view=markup STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From tcdelaney at optusnet.com.au Sun May 20 08:25:52 2007 From: tcdelaney at optusnet.com.au (Tim Delaney) Date: Sun, 20 May 2007 16:25:52 +1000 Subject: [Python-3000] PEP 367: New Super References: <003001c795f8$d5275060$0201a8c0@mshome.net> <20070514165704.4F8D23A4036@sparrow.telecommunity.com> Message-ID: <000b01c79aa7$ba716cc0$0201a8c0@mshome.net> Phillip J. Eby wrote: > At 05:23 PM 5/14/2007 +1000, Tim Delaney wrote: >> Determining the class object to use >> ''''''''''''''''''''''''''''''''''' >> >> The exact mechanism for associating the method with the defining >> class is not >> specified in this PEP, and should be chosen for maximum performance. >> For CPython, it is suggested that the class instance be held in a >> C-level variable >> on the function object which is bound to one of ``NULL`` (not part >> of a class), >> ``Py_None`` (static method) or a class object (instance or class >> method). > > Another open issue here: is the decorated class used, or the > undecorated class? Sorry I've taken so long to get back to you about this - had email problems. I'm not sure what you're getting at here - are you referring to the decorators for classes PEP? In that case, the decorator is applied after the class is constructed, so it would be the undecorated class. Are class decorators going to update the MRO? I see nothing about that in PEP 3129, so using the undecorated class would match the current super(cls, self) behaviour. Tim Delaney From tcdelaney at optusnet.com.au Sun May 20 08:44:03 2007 From: tcdelaney at optusnet.com.au (Tim Delaney) Date: Sun, 20 May 2007 16:44:03 +1000 Subject: [Python-3000] PEP 367: New Super Message-ID: <009c01c79aaa$441b0dd0$0201a8c0@mshome.net> Tim Delaney wrote: > Phillip J. Eby wrote: >> At 05:23 PM 5/14/2007 +1000, Tim Delaney wrote: >>> Determining the class object to use >>> ''''''''''''''''''''''''''''''''''' >>> >>> The exact mechanism for associating the method with the defining >>> class is not >>> specified in this PEP, and should be chosen for maximum performance. >>> For CPython, it is suggested that the class instance be held in a >>> C-level variable >>> on the function object which is bound to one of ``NULL`` (not part >>> of a class), >>> ``Py_None`` (static method) or a class object (instance or class >>> method). >> >> Another open issue here: is the decorated class used, or the >> undecorated class? > > Sorry I've taken so long to get back to you about this - had email > problems. > I'm not sure what you're getting at here - are you referring to the > decorators for classes PEP? In that case, the decorator is applied > after the class is constructed, so it would be the undecorated class. > > Are class decorators going to update the MRO? I see nothing about > that in PEP 3129, so using the undecorated class would match the > current super(cls, self) behaviour. Duh - I'm an idiot. Of course, the current behaviour uses name lookup, so it would use the decorated class. So the question is, should the method store the class, or the name? Looking up by name could pick up a totally unrelated class, but storing the undecorated class could miss something important in the decoration. Tim Delaney From ncoghlan at gmail.com Sun May 20 09:42:22 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 May 2007 17:42:22 +1000 Subject: [Python-3000] PEP 367: New Super In-Reply-To: <009c01c79aaa$441b0dd0$0201a8c0@mshome.net> References: <009c01c79aaa$441b0dd0$0201a8c0@mshome.net> Message-ID: <464FFBDE.4000109@gmail.com> Tim Delaney wrote: > So the question is, should the method store the class, or the name? Looking > up by name could pick up a totally unrelated class, but storing the > undecorated class could miss something important in the decoration. Couldn't we provide a mechanism whereby the cell can be adjusted to point to the decorated class? (heck, the interpreter has access to both classes after execution of the class statement - it could probably arrange for this to happen automatically whenever the decorated and undecorated classes are different). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From martin at v.loewis.de Sun May 20 09:47:16 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 20 May 2007 09:47:16 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com><4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> Message-ID: <464FFD04.90602@v.loewis.de> > That is how I felt when you dismissed my effort to make your proposal more > useful and more acceptable to some (by addressing transliteration) with the > little molehill problem that Norwegians and Germans disagree about o: > (rotated 90 degrees). So let me phrase this differently: I'm not aware of an algorithm that can do transliteration for all Unicode characters. Therefore, I cannot add transliteration into the PEP. Do you know of any? Regards, Martin From tcdelaney at optusnet.com.au Sun May 20 10:20:37 2007 From: tcdelaney at optusnet.com.au (Tim Delaney) Date: Sun, 20 May 2007 18:20:37 +1000 Subject: [Python-3000] PEP 367: New Super References: <009c01c79aaa$441b0dd0$0201a8c0@mshome.net> <464FFBDE.4000109@gmail.com> Message-ID: <00ae01c79ab7$c1ea5100$0201a8c0@mshome.net> Nick Coghlan wrote: > Tim Delaney wrote: >> So the question is, should the method store the class, or the name? >> Looking up by name could pick up a totally unrelated class, but >> storing the undecorated class could miss something important in the >> decoration. > > Couldn't we provide a mechanism whereby the cell can be adjusted to > point to the decorated class? (heck, the interpreter has access to > both classes after execution of the class statement - it could > probably arrange for this to happen automatically whenever the > decorated and undecorated classes are different). Yep - I thought of that. I think that's probably the right way to go. Tim Delaney From jdahlin at async.com.br Fri May 18 21:49:11 2007 From: jdahlin at async.com.br (Johan Dahlin) Date: Fri, 18 May 2007 16:49:11 -0300 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: Message-ID: <464E0337.3000905@async.com.br> Guido van Rossum wrote: > While reviewing PEPs, I stumbled over PEP 335 ( Overloadable Boolean > Operators) by Greg Ewing. I am of two minds of this -- on the one > hand, it's been a long time without any working code or anything. OTOH > it might be quite useful to e.g. numpy folks. This kind of feature would also be useful for ORMs, to be able to map boolean operators to SQL. Johan From baptiste13 at altern.org Sun May 20 16:10:15 2007 From: baptiste13 at altern.org (Baptiste Carvello) Date: Sun, 20 May 2007 16:10:15 +0200 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: Guido van Rossum a ?crit : > On 5/18/07, Baptiste Carvello wrote: >> Guido van Rossum a ?crit : >>> Do people think it would be too radical if the built-in open() >>> function was removed altogether, requiring all code that opens files >>> to import the io module first? This would make it easier to identify >>> modules that engage in I/O. >> -1 >> >> Will someone think of the interactive users ? > > What kind of interactive use are you making of open()? > Well, mostly two things: for one, quick inspection of data files (I'm working in physics). Sure, I can also use pylab.load with most reasonable data file formats. But sometimes, you have a really weird format and/or you just want to quickly read a few values. The other main use case is common sysadmin-type jobs, as in >>> for line in open('records.txt'): ... print line.split(':')[0] Now, I was jokingly making it sound more dramatic than it really is. Of course, I can do import io (especially with a 2-letter module name, it's not that bad), just like I now do import shutil (or is that shutils, I never remember) when I need to modify the filesystem. No big deal. I just wanted to point out that any cleaning of the builtin namespace is a benefit for programmers, but also a disadvantage for interactive users. How the trade-off is made is yours to decide. Thanks for caring, Baptiste From baptiste13 at altern.org Sun May 20 16:19:14 2007 From: baptiste13 at altern.org (Baptiste Carvello) Date: Sun, 20 May 2007 16:19:14 +0200 Subject: [Python-3000] Radical idea: remove built-in open (require import io) In-Reply-To: References: Message-ID: Georg Brandl a ?crit : > Baptiste Carvello schrieb: >> Guido van Rossum a ?crit : >>> Do people think it would be too radical if the built-in open() >>> function was removed altogether, requiring all code that opens files >>> to import the io module first? This would make it easier to identify >>> modules that engage in I/O. >>> >> -1 >> >> Will someone think of the interactive users ? > > They can still put "import sys, os, io" in their PYTHONSTARTUP file. > Thanks, I had forgotten that possibility. > Or use IPython. > Well, I have to say that I'm a bit worried with a current trend on python-dev, to answer any question about interactive use with pointing to IPython. I *love* IPython. I'm using it a lot. But sometimes, because of the longer startup time, or because you want to stay close to "normal" python, you prefer to use the standard interpreter. And I believe this should really stay an *supported* use. Of course, on this specific case, I understand a trade-off has to be made. Baptiste From alexandre at peadrop.com Sun May 20 23:28:14 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 20 May 2007 17:28:14 -0400 Subject: [Python-3000] Introduction and request for commit access to the sandbox. Message-ID: Hello, As some of you may already know, I will be working on Python for this year Google Summer of Code. My project is to merge the modules with a dual C and Python implementation, i.e. cPickle/pickle, cStringIO/StringIO and cProfile/profile [1]. This project is part of the standard library reorganization for Python 3000 [2]. And my mentor for this project is Brett Cannon. So first, let me introduce myself. I am currently a student from Quebec, Canada. I plan to make a career as a (hopefully good) programmer. Therefore, I dedicate a lot of my free time contributing to open source projects, like Ubuntu. I, recently, became interested by how compilers and interpreters work. So, I started reading Python's source code, which is one of the most well organized and comprehensive code base I have seen. This motivated me to start contributing to Python. However since school kept me fairly busy, I haven't had the chance to do anything other than providing support to Python's users in the #python FreeNode IRC channel. This year Summer of Code will give me the chance to do a significant contribution to Python, and to get started with Python code development as well. With that said, I would to request svn access to the sandbox for my work. I will use this access only for modifying stuff in the directory I will be assigned to. I would like to use the username "avassalotti" and the attached SSH2 public key for this access. One last thing, if you know semantic differences (other than the obvious ones) between the C and Python versions of the modules I need to merge, please let know. This will greatly simplify the merge and reduce the chances of later breaking. Cheers, -- Alexandre .. [1] Abstract of my application, Merge the C and Python implementations of the same interface (http://code.google.com/soc/psf/appinfo.html?csaid=C6768E09BEF7CCE2) .. [2] PEP 3108, Standard Library Reorganization, Cannon (http://www.python.org/dev/peps/pep-3108) -------------- next part -------------- A non-text attachment was scrubbed... Name: id_dsa.pub Type: application/octet-stream Size: 610 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20070520/1242e699/attachment.obj From pje at telecommunity.com Mon May 21 02:07:42 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 20 May 2007 20:07:42 -0400 Subject: [Python-3000] PEP 367: New Super In-Reply-To: <000b01c79aa7$ba716cc0$0201a8c0@mshome.net> References: <003001c795f8$d5275060$0201a8c0@mshome.net> <20070514165704.4F8D23A4036@sparrow.telecommunity.com> <000b01c79aa7$ba716cc0$0201a8c0@mshome.net> Message-ID: <20070521000552.B64C93A4061@sparrow.telecommunity.com> At 04:25 PM 5/20/2007 +1000, Tim Delaney wrote: >I'm not sure what you're getting at here - are you referring to the >decorators for classes PEP? In that case, the decorator is applied >after the class is constructed, so it would be the undecorated class. > >Are class decorators going to update the MRO? I see nothing about >that in PEP 3129, so using the undecorated class would match the >current super(cls, self) behaviour. Class decorators can (and sometimes *do*, in PEAK) return an object that's not the original class object. So that would break super, which is why my inclination is to go with using the decorated result. From pje at telecommunity.com Mon May 21 02:11:24 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Sun, 20 May 2007 20:11:24 -0400 Subject: [Python-3000] [Python-Dev] PEP 367: New Super In-Reply-To: <00ae01c79ab7$c1ea5100$0201a8c0@mshome.net> References: <009c01c79aaa$441b0dd0$0201a8c0@mshome.net> <464FFBDE.4000109@gmail.com> <00ae01c79ab7$c1ea5100$0201a8c0@mshome.net> Message-ID: <20070521000933.202083A4061@sparrow.telecommunity.com> At 06:20 PM 5/20/2007 +1000, Tim Delaney wrote: >Nick Coghlan wrote: > > Tim Delaney wrote: > >> So the question is, should the method store the class, or the name? > >> Looking up by name could pick up a totally unrelated class, but > >> storing the undecorated class could miss something important in the > >> decoration. > > > > Couldn't we provide a mechanism whereby the cell can be adjusted to > > point to the decorated class? (heck, the interpreter has access to > > both classes after execution of the class statement - it could > > probably arrange for this to happen automatically whenever the > > decorated and undecorated classes are different). > >Yep - I thought of that. I think that's probably the right way to go. Btw, PEP 3124 needs a way to receive the same class object at more or less the same moment, although in the form of a callback rather than a cell assignment. Guido suggested I co-ordinate with you to design a mechanism for this. From martin at v.loewis.de Mon May 21 06:44:25 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 21 May 2007 06:44:25 +0200 Subject: [Python-3000] [Python-Dev] Introduction and request for commit access to the sandbox. In-Reply-To: References: Message-ID: <465123A9.8090500@v.loewis.de> > With that said, I would to request svn access to the sandbox for my > work. I will use this access only for modifying stuff in the directory > I will be assigned to. I would like to use the username "avassalotti" > and the attached SSH2 public key for this access. I have added your key. As we have a strict first.last account policy, I named it alexandre.vassalotti; please correct me if I misspelled it. > One last thing, if you know semantic differences (other than the > obvious ones) between the C and Python versions of the modules I need > to merge, please let know. This will greatly simplify the merge and > reduce the chances of later breaking. Somebody noticed on c.l.p that, for cPickle, a) cPickle will start memo keys at 1; pickle at 0 b) cPickle will not put things into the memo if their refcount is 1, whereas pickle puts everything into the memo. Not sure what you'd consider obvious, but I'll mention that cStringIO "obviously" is constrained in what data types you can write (namely, byte strings only), whereas StringIO allows Unicode strings as well. Less obviously, StringIO also allows py> s = StringIO(0) py> s.write(10) py> s.write(20) py> s.getvalue() '1020' Regards, Martin From tomerfiliba at gmail.com Mon May 21 12:03:50 2007 From: tomerfiliba at gmail.com (tomer filiba) Date: Mon, 21 May 2007 12:03:50 +0200 Subject: [Python-3000] PEP 3131 - the details Message-ID: <1d85506f0705210303g7e18a769vd95480799be08dd2@mail.gmail.com> [Martin v. L?wis] > > So, maybe it's better to keep the status quo, and not allow Cf > > characters, unless someone comes up with a particular need for doing so. > > Hm, I think I've convinced myself of that now. :) > > That is my reasoning, too. People seem to want to be conservative, > so it's safer to reject formatting characters for the moment. > If people come up with a need, they still can be added. > > (there might be a need for it in RTL languages, supporting > 200E..200F and 202A..202E, but it seems that speakers of RTL > languages are skeptical about the entire PEP, so it's unclear > whether allowing these would help anything) i thought of simply treating Cf chars as whitespace -- i.e., they are allowed BETWEEN identifiers, but not INSIDE of them. but then again, what if i wanted identifiers in more than one language or direction? that may seem pointless, but i can give concrete examples of usage -- the cardinal numbers (aleph one and friends): ?1 1? without the LTR marker, it would read one-aleph, which also *looks* like an invalid indentifier, because it begins with a number (although it doesn't). the point is -- you must allow such markers to appear inside tokens. allowing me to use greek symbols in equations, but NOT allowing me to use hebrew ones, is just wrong. either you allow latin-only, or you allow every character supported by unicode. there's no justification for compromises, as the motivation of the PEP is localization, and you can't discriminate one locale from another. it's getting complicated. that's why i was against it from the very start. i mean, i wouldn't mind having it, but being familiar with RTL languages, i know how complex it is. -tomer From jimjjewett at gmail.com Mon May 21 17:56:21 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 21 May 2007 11:56:21 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <464FFD04.90602@v.loewis.de> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> Message-ID: On 5/20/07, "Martin v. L?wis" wrote: > > That is how I felt when you dismissed my effort to make your proposal more > > useful and more acceptable to some (by addressing transliteration) with the > > little molehill problem that Norwegians and Germans disagree about o: > > (rotated 90 degrees). > So let me phrase this differently: I'm not aware of an algorithm that > can do transliteration for all Unicode characters. Therefore, I cannot > add transliteration into the PEP. > Do you know of any? There is no single transliteration that will both (1) Work for all languages, and (2) Be readable on its own But are those real requirements? (1) Would it be acceptable to create an encoding such that you could read and write L?wis in your editor, but upon import, python treated it as though you had writtten LU_246wis Other modules would see LU_246wis, unless they also used that encoding -- in which case the user should also see L?wis while editing. (I'm not suggesting character-at-a-time replacements as the *right* answer, but the mechanics of recoding are less important than whether or not to accept the use of mangled internal identifiers.) (2) If the above is not acceptable, and even the internal representation has to be readable, then would it be acceptable to make the transliteration strategy something the user could set, similar to today's coding: directive? -jJ From jimjjewett at gmail.com Mon May 21 18:30:35 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 21 May 2007 12:30:35 -0400 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <1d85506f0705210303g7e18a769vd95480799be08dd2@mail.gmail.com> References: <1d85506f0705210303g7e18a769vd95480799be08dd2@mail.gmail.com> Message-ID: On 5/21/07, tomer filiba wrote: > i thought of simply treating Cf chars as whitespace -- i.e., they > are allowed BETWEEN identifiers, but not INSIDE of them. I think the suggestion from other languages was to strip them out during canonicalization. This allows abc and cba to refer to the same identifier, if someone is being sneaky. Whether that is a problem or not, ... I think so, but it is a judgment call. > but then again, what if i wanted identifiers in more than one language > or direction? that may seem pointless, but i can give concrete > examples of usage -- the cardinal numbers (aleph one and friends): > ?1 > 1? In my English math classes, this was simply written with the aleph before the one; since the aleph was only a single character, it didn't really matter which order we would have used for additional characters. I think this could be generalized so that RTL is assumed to switch when switching back out of an RTL script, even if the next character is "inherited" (like parens) or "common" (like numbers). ? 123 -jJ From jason.orendorff at gmail.com Mon May 21 19:18:59 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Mon, 21 May 2007 13:18:59 -0400 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <464C9D55.9080501@v.loewis.de> References: <464BBE2B.1050201@acm.org> <2A4F5FE3-9F8A-4B74-B46D-B63F1260B7FD@fuhm.net> <464C9D55.9080501@v.loewis.de> Message-ID: On 5/17/07, "Martin v. L?wis" wrote: > That is my reasoning, too. People seem to want to be conservative, > so it's safer to reject formatting characters for the moment. > If people come up with a need, they still can be added. How about this: *require* the LEFT-TO-RIGHT MARK after every sequence of RTL characters outside a string or comment; and *forbid* all other Cf characters. This is just as conservative, but supports RTL-language identifiers better. It prevents all the "stupid bidi tricks" I know of (abc = cba and so forth). It pins the cost of maintaining bidi sanity on writers rather than readers of code. For all existing code, this is no cost at all, of course. For RTL languages this is a nontrivial burden, but Python can't fix that--it's a fact of bidi life. -j From tjreedy at udel.edu Mon May 21 23:30:28 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 May 2007 17:30:28 -0400 Subject: [Python-3000] Support for PEP 3131 References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com><4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> Message-ID: ""Martin v. L?wis"" wrote in message news:464FFD04.90602 at v.loewis.de... | I'm not aware of an algorithm that | can do transliteration for all Unicode characters. Were you proposing to allow all Unicode characters in Python names?-) | Therefore, I cannot add transliteration into the PEP. Non sequitor. How I read this is "Because I do not know how to do something that does not need to be done, I cannot do something that could be done." So it strikes me as another red-herring dismissal that seems to ignore the actual content of what I proposed, which was to do something that I believe can be done and which would be useful to do. My proposal was that the Unicode characters allowed in Python identifiers be limited to those with a transliteration, either current or to be developed by those who want to use a particular character set. So if, for instance, one or more people wanted to program in Klingon in its 'native' characters, they would need to provide the mapping (which I suspect already exists). Transliterations more or less official do exist, I believe, for the major languages that we are seriously concerned with. And for just readablity purposes, I would leave the accented latin chars alone, and even let them be available as part of an extended target set. So while I might be wrong, I *think* that we could get 99% use-case coverage. While the PEPs acceptance as-is (for which I congratulate you for your persistence) makes transliteration moot as an acceptibility enhancement, it does not change its desireability for use purposes. To repeat: without it, national character identifiers will tend to ghettoize code. While this might be a minor issue for Chinese, it will be a bigger issue for people writing in Thai or Ibo or other languages with small pioneering groups of Python programmers. Terry Jan Reedy From ncoghlan at gmail.com Tue May 22 00:01:29 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 May 2007 08:01:29 +1000 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com><4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> Message-ID: <465216B9.8040802@gmail.com> Terry Reedy wrote: > > My proposal was that the Unicode characters allowed in Python identifiers > be limited to those with a transliteration, either current or to be > developed by those who want to use a particular character set. Japanese has a transliteration to Roman script, but it suffers from ambiguity that typically isn't present in the native written forms of the words (i.e. there are different characters in Kanji which are pronounced the same way, and spelt the same way in hiragana - and it is only the hiragana syllabary which can be mapped to roman characters). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From martin at v.loewis.de Tue May 22 00:19:33 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 22 May 2007 00:19:33 +0200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <1d85506f0705210303g7e18a769vd95480799be08dd2@mail.gmail.com> References: <1d85506f0705210303g7e18a769vd95480799be08dd2@mail.gmail.com> Message-ID: <46521AF5.3030500@v.loewis.de> > i thought of simply treating Cf chars as whitespace -- i.e., they > are allowed BETWEEN identifiers, but not INSIDE of them. Ok - that would also work. Are you proposing that the PEP is changed in that way, or are you merely stating that it would "work"? (ie. would you prefer to see it changed that way) > without the LTR marker, it would read one-aleph, which also *looks* like > an invalid indentifier, because it begins with a number (although it > doesn't). > the point is -- you must allow such markers to appear inside tokens. That seems to be a different specification now - you are now saying that they should *not* be treated like whitespace. So I'm still at a loss what the PEP should say about Cf characters. > allowing me to use greek symbols in equations, but NOT allowing me > to use hebrew ones, is just wrong. either you allow latin-only, or you > allow every character supported by unicode. there's no justification > for compromises, as the motivation of the PEP is localization, and > you can't discriminate one locale from another. But the PEP does not do that! It allows to use both Hebrew and Greek letters in identifiers. > it's getting complicated. that's why i was against it from the very start. > i mean, i wouldn't mind having it, but being familiar with RTL languages, > i know how complex it is. Sure. If there isn't a clearly "correct" specification, the conservative approach requested by several people here would require to reject Cf characters - they are not letters, so they are *not* similar to Greek letters (not sure whether you suggested that they are). Then, if later there is a demonstrated need for formatting characters, they still could be added. Regards, Martin From martin at v.loewis.de Tue May 22 00:27:35 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 22 May 2007 00:27:35 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> Message-ID: <46521CD7.9030004@v.loewis.de> > Would it be acceptable to create an encoding such that you could read > and write > > L?wis > > in your editor, but upon import, python treated it as though you had > writtten > > LU_246wis > > Other modules would see LU_246wis, unless they also used that encoding > -- in which case the user should also see L?wis while editing. What problem would that solve? You could type the identifier that way - but you would need to know already that this is the identifier you want to type; how do you know? > (I'm not suggesting character-at-a-time replacements as the *right* > answer, but the mechanics of recoding are less important than whether > or not to accept the use of mangled internal identifiers.) Again, I'm uncertain what the use case here would be. For "proper" transliteration, users can memorize easily what the transliterated name would be, and visually identify the two representations. With a "numeric transliteration", users would *not* normally be able to tell what a transliterated character means, or how to transliterate a given character. > If the above is not acceptable, and even the internal representation > has to be readable, then would it be acceptable to make the > transliteration strategy something the user could set, similar to > today's coding: directive? > Then I don't understand your above proposal. I thought you were proposing to replace all non-ASCII characters with some ASCII form on import of the module. What do you mean by "readable internal representation"? Regards, Martin From martin at v.loewis.de Tue May 22 00:31:36 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 22 May 2007 00:31:36 +0200 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: References: <464BBE2B.1050201@acm.org> <2A4F5FE3-9F8A-4B74-B46D-B63F1260B7FD@fuhm.net> <464C9D55.9080501@v.loewis.de> Message-ID: <46521DC8.3080704@v.loewis.de> > How about this: *require* the LEFT-TO-RIGHT MARK after > every sequence of RTL characters outside a string or > comment; and *forbid* all other Cf characters. > > This is just as conservative, but supports RTL-language > identifiers better. It prevents all the "stupid bidi tricks" > I know of (abc = cba and so forth). This is indeed more conservative, and I could happily put it in the PEP, but again I prefer not to do so without an explicit confirmation from a user of such a language that this actually helps anything. tomer's comment (that you need the mark even inside an identifier) has puzzled me. Regards, Martin From jimjjewett at gmail.com Tue May 22 01:29:36 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 21 May 2007 19:29:36 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <46521CD7.9030004@v.loewis.de> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> Message-ID: On 5/21/07, "Martin v. L?wis" wrote: > > Would it be acceptable to create an encoding such that you could read > > and write > > L?wis > > in your editor, but upon import, python treated it as though you had > > writtten > > LU_246wis > > Other modules would see LU_246wis, unless they also used that encoding > > -- in which case the user should also see L?wis while editing. > What problem would that solve? You could type the identifier that > way - but you would need to know already that this is the identifier > you want to type; how do you know? (1) If I am using the module based on its documentation, or based on opening it up and reading it, then I can use the same encoding, and I can write L?wis. (2) If I do arbitrary introspection, such as import sys for k, v in sys.modules: if v: print dir(v) then I will get something usable, though perhaps not easily readable. (3) The mapping is reversible, so I can work interactively with the arbitrary characters by setting my console/idle preferences to the special encoding. > Again, I'm uncertain what the use case here would be. For "proper" > transliteration, users can memorize easily what the transliterated > name would be, and visually identify the two representations. For two latin-based alphabets, yes. I'm not so sure for non-western scripts. As you pointed out, the correct transliteration may depend on the natural language (instead of just the character code point), which means we probably can't do it automatically. It also has to be a one-way transliteration; if ? -> o (or oe) then an o (or oe) in the result can't always be transliterated back. > With a "numeric transliteration", users would *not* normally be > able to tell what a transliterated character means, or how to > transliterate a given character. (1) They shouldn't ever need to see the numeric version unless they're intentionally peeking under the covers, or their site doesn't have the appropriate encoding installed. One advantage of this method is that a single transliteration method could work for any language, so it probably would be installed already. (2) Even if users did somehow see the numeric version, it wouldn't be that awful. For the langauges close enough to ASCII that a transliteration is straightforward, the number of extra characters to memorize is fairly small. > > If the above is not acceptable, and even the internal representation > > has to be readable, then would it be acceptable to make the > > transliteration strategy something the user could set, similar to > > today's coding: directive? > Then I don't understand your above proposal. I thought you were > proposing to replace all non-ASCII characters with some ASCII form > on import of the module. What do you mean by "readable internal > representation"? This alternative would let an individual user say "I'm writing Swedish; turn my ? into an o." The actual identifiers used by Python itself would be more readable, but the downside is that users would have to read them more often, instead of using/editing/viewing strictly in the untransliterated version. -jJ From foom at fuhm.net Tue May 22 02:28:20 2007 From: foom at fuhm.net (James Y Knight) Date: Mon, 21 May 2007 20:28:20 -0400 Subject: [Python-3000] PEP 3131 - the details In-Reply-To: <46521DC8.3080704@v.loewis.de> References: <464BBE2B.1050201@acm.org> <2A4F5FE3-9F8A-4B74-B46D-B63F1260B7FD@fuhm.net> <464C9D55.9080501@v.loewis.de> <46521DC8.3080704@v.loewis.de> Message-ID: <18E17A4B-1B0A-4C75-993B-2B74B8DE5D91@fuhm.net> On May 21, 2007, at 6:31 PM, Martin v. L?wis wrote: > This is indeed more conservative, and I could happily put it > in the PEP, but again I prefer not to do so without an explicit > confirmation from a user of such a language that this actually > helps anything. > > tomer's comment (that you need the mark even inside an identifier) > has puzzled me. I agree: nothing should be done without an explicit example of how it will actually improve matters. For example: editor XYZ can be used to sensibly edit JS/C#/Java/whatever code in an RTL language, and could also be used to edit python if only python did . Anything else seems to be simply wild speculation, and should not be implemented on the off chance that it might be useful in the future. James From martin at v.loewis.de Tue May 22 07:00:52 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 22 May 2007 07:00:52 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> Message-ID: <46527904.1000202@v.loewis.de> > If I do arbitrary introspection, such as > > import sys > for k, v in sys.modules: > if v: > print dir(v) > > then I will get something usable, though perhaps not easily readable. I think this is unacceptable (at least I cannot accept it): with reflection, I want to get the *true* variable names, not the mangled ones. In the scenario that people had discussed with using long Japanese method names for test methods, if the method fails, you clearly want to see the Japanese name, so you can easily read what failed. > The mapping is reversible, so I can work interactively with the > arbitrary characters by setting my console/idle preferences to the > special encoding. That could work both ways, of course. If you want a reflective API to give you mangled names, you could easily implement that yourself on top of PEP 3131. >> Again, I'm uncertain what the use case here would be. For "proper" >> transliteration, users can memorize easily what the transliterated >> name would be, and visually identify the two representations. > > For two latin-based alphabets, yes. I'm not so sure for non-western > scripts. I know that the Chinese regularly use pinyin for transliteration, and somebody confirmed in c.l.p that they also use it in programming if they can't use the Chinese characters directly. > As you pointed out, the correct transliteration may depend on the > natural language (instead of just the character code point), which > means we probably can't do it automatically. That's the problem, yes. > It also has to be a one-way transliteration; if ? -> o (or oe) then an > o (or oe) in the result can't always be transliterated back. The same is true for your "numeric transliteration": there is no way to *reliably* tell whether some string is a mangled string, or just happens to include U_ in the identifier (which it legally can do today). That's why Java and C++ use \u, so you would write L\u00F6wis as an identifier. *This* is truly unambiguous. I claim that it is also useless. > (1) They shouldn't ever need to see the numeric version unless > they're intentionally peeking under the covers, or their site doesn't > have the appropriate encoding installed. One advantage of this method > is that a single transliteration method could work for any language, > so it probably would be installed already. I think you are really arguing for \u escapes in identifiers here. > (2) Even if users did somehow see the numeric version, it wouldn't be > that awful. For the langauges close enough to ASCII that a > transliteration is straightforward, the number of extra characters to > memorize is fairly small. What about the other languages? This PEP is not just for latin-based scripts. >> Then I don't understand your above proposal. I thought you were >> proposing to replace all non-ASCII characters with some ASCII form >> on import of the module. What do you mean by "readable internal >> representation"? > > This alternative would let an individual user say "I'm writing > Swedish; turn my ? into an o." The actual identifiers used by Python > itself would be more readable, but the downside is that users would > have to read them more often, instead of using/editing/viewing > strictly in the untransliterated version. That again cannot work because you don't have transliteration algorithms for all characters, or all languages. Regards, Martin From martin at v.loewis.de Tue May 22 07:58:17 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 22 May 2007 07:58:17 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com><4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> Message-ID: <46528679.9060709@v.loewis.de> > | I'm not aware of an algorithm that > | can do transliteration for all Unicode characters. > > Were you proposing to allow all Unicode characters in Python names?-) Not sure how to interpret your question: no, I'm not proposing to allow all Unicode characters, just a selected subset (but then, I don't know a universal transliteration algorithm for that subset, either). > | Therefore, I cannot add transliteration into the PEP. > > Non sequitor. How I read this is "Because I do not know how to do > something that does not need to be done, I cannot do something that could > be done." No. You should read it "because I don't know how to do it, *I* will not do it". > My proposal was that the Unicode characters allowed in Python identifiers > be limited to those with a transliteration, either current or to be > developed by those who want to use a particular character set. But what would be the purpose of doing so? Mere existence of a transliteration algorithm surely isn't what you are after. > While the PEPs acceptance as-is (for which I congratulate you for your > persistence) makes transliteration moot as an acceptibility enhancement, it > does not change its desireability for use purposes. To repeat: without it, > national character identifiers will tend to ghettoize code. While this > might be a minor issue for Chinese, it will be a bigger issue for people > writing in Thai or Ibo or other languages with small pioneering groups of > Python programmers. What I fail to see is how existence of a transliteration algorithm would remove the ghettoization. It must be used somehow, no? Regards, Martin From jimjjewett at gmail.com Tue May 22 22:29:02 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 22 May 2007 16:29:02 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <46527904.1000202@v.loewis.de> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> Message-ID: On 5/22/07, "Martin v. L?wis" wrote: > That's why Java and C++ use \u, so you would write L\u00F6wis > as an identifier. ... > I think you are really arguing for \u escapes in identifiers here. Yes, that is effectively what I was suggesting. > *This* is truly unambiguous. I claim that it is also useless. It means users could see the usability benefits of PEP3131, but the python internals could still work with ASCII only. It simplifies checking for identifiers that *don't* stick to ASCII, which reduces some of the concerns about confusable characters, and which ones to allow. Short list of judgment calls that we need to resolve if we go with non-ASCII identifiers, but can largely ignore if we just use escaping: Based only on UAX 31: ID vs XID (unicode changed their mind on recommendations) include stability extensions? (*Python* didn't allow those letters previously.) which of ID_CONTINUE should be left out. (We don't want "-", and some of the punctuation and other marks may be closer to "-" than to "_". Or they might not be, and I don't know how to judge that.) layout and control charcters (At the top of section 2, tr31 recommends acting as though they weren't there ... but if we use a normal (unicode) string, then they will still affect the hash. Down in 2.2, they say not to permit them, except sometimes...) Canonicalization Combining Marks should be accepted (only as continuation chars), but not if they're enclosing marks, because ... well, I'm not sure, but I'll have to trust them. Specific character Adjustments (sec 2.3) -- The example suggests that we might have to tailor for our use of "_", though I didn't get that from the table. They do suggest tailoring out certain Decomposition Types. Additional (non-letter?) characters which may occur in words (see UAX29, but I don't claim to fully understand it) Undefined code points, particularly those which might be defined later? Should we exclude the letters that look like punctuation? A proposed update (http://www.unicode.org/reports/tr31/tr31-8.html) mentions U+02B9 (modifier letter prime) only because the visually equivalent U+0374 (Greek Numeral Sign) shouldn't be an identifier, but does fold to it under (some?) canonicalization. (They suggest allowing both, instead of neither.) Then TR 39 http://www.unicode.org/reports/tr39/ recommends excluding (most, but not all of) characters not in modern use; characters only used in specialized fields, such as liturgical characters, mathematical letter-like symbols, and certain phonetic alphabetics; and ideographic characters that are not part of a set of core CJK ideographs consisting of the CJK Unified Ideographs block plus IICore (the set of characters defined by the IRG as the minimal set of required ideographs for East Asian use). They summarize this in http://www.unicode.org/reports/tr39/data/xidmodifications.txt; I wouldn't add the hyphen-minus back in, but I don't know whether katakana middle dot should be allowed. Should mixed-script identifiers be allowed? According to TR 36 (http://www.unicode.org/reports/tr36/) ASCII only is the safest, and that is followed by limits on mixed-script identifiers. Those limits sound reasonable to me, but ... I'm not the one who would be mixing them. Note that even "highly restrictive" allows ASCII + Han + Hiragana + Katakana, ASCII + Han + Bopomofo, and ASCII + Han + Hangul. (I think we wanted at least the ASCII numbers with anything.) -jJ From jimjjewett at gmail.com Tue May 22 22:30:50 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 22 May 2007 16:30:50 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <46528679.9060709@v.loewis.de> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> <46528679.9060709@v.loewis.de> Message-ID: On 5/22/07, "Martin v. L?wis" wrote: [Referring to my alternate alternative proposal -- user-controlled transliteration, rather than unicode escapes in identifiers] > >> Then I don't understand your above proposal. I thought you were > >> proposing to replace all non-ASCII characters with some ASCII > >> form on import of the module. What do you mean by "readable > >> internal representation"? That ASCII form -- and the requirement that it still be something humans don't mind reading -- which in turn means that it can't be done as a single one-size-fits-all algorithm; users would have to be able to choose (and perhaps locally modify) it. -jJ From alexandre at peadrop.com Tue May 22 22:35:36 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 22 May 2007 16:35:36 -0400 Subject: [Python-3000] [Python-Dev] Introduction and request for commit access to the sandbox. In-Reply-To: <465123A9.8090500@v.loewis.de> References: <465123A9.8090500@v.loewis.de> Message-ID: On 5/21/07, "Martin v. L?wis" wrote: > > With that said, I would to request svn access to the sandbox for my > > work. I will use this access only for modifying stuff in the directory > > I will be assigned to. I would like to use the username "avassalotti" > > and the attached SSH2 public key for this access. > > I have added your key. As we have a strict first.last account policy, > I named it alexandre.vassalotti; please correct me if I misspelled it. Thanks! > > One last thing, if you know semantic differences (other than the > > obvious ones) between the C and Python versions of the modules I need > > to merge, please let know. This will greatly simplify the merge and > > reduce the chances of later breaking. > > Somebody noticed on c.l.p that, for cPickle, > a) cPickle will start memo keys at 1; pickle at 0 > b) cPickle will not put things into the memo if their refcount is > 1, whereas pickle puts everything into the memo. Noted. I think I found the thread on c.l.p about it: http://groups.google.com/group/comp.lang.python/browse_thread/thread/68c72a5066e4c9bb/b2bc78f7d8d50320 > Not sure what you'd consider obvious, but I'll mention that cStringIO > "obviously" is constrained in what data types you can write (namely, > byte strings only), whereas StringIO allows Unicode strings as well. Yes. I was already aware of this. I just hope this problem will go away with the string unification in Python 3000. However, I will need to deal with this, sooner or later, if I want to port the merge to 2.x. > Less obviously, StringIO also allows > > py> s = StringIO(0) > py> s.write(10) > py> s.write(20) > py> s.getvalue() > '1020' That is probably due to the design of cStringIO, which is separated into two subparts StringI and StringO. So when the constructor of cStringIO is given a string, it builds an output object, otherwise it builds an input object: static PyObject * IO_StringIO(PyObject *self, PyObject *args) { PyObject *s=0; if (!PyArg_UnpackTuple(args, "StringIO", 0, 1, &s)) return NULL; if (s) return newIobject(s); return newOobject(128); } As you see, cStringIO's code also needs a good cleanup to make it, at least, conforms to PEP-7. -- Alexandre From python at zesty.ca Wed May 23 00:08:05 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Tue, 22 May 2007 17:08:05 -0500 (CDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: On Thu, 17 May 2007, Guido van Rossum wrote: > I have accepted PEP 3131. I'm surprised that this happened so quickly. I oppose this proposal quite strongly. Currently Python has the property that the character set is a fully known quantity. There currently exists a choice of keyboard, a choice of editor, and a set of literacy skills that is sufficient for any Python code in the world. Adopting PEP 3131 destroys this property. It is not just that particular communities (e.g. English speakers) will be unable to understand code by other particular communities (e.g. Japanese speakers); that is relatively minor and arguably already the case. The real problem is that it will be impossible for *anyone*, no matter what their background, to acquire the resources necessary to handle all Python code. There will exist no keyboard that enables one to edit any Python program, and probably no editor. There will not be a single human being alive who can know or recognize the whole character set. Using APIs in a few different languages would yield a program that no one could understand. Today, if a non-English speaker asks you how to learn Python, you can answer that question. You can explain Python's syntax and semantics, and tell them they need to know the 26 letters of the Roman alphabet. After PEP 3131, you won't be able to answer their question -- because it will be impossible for any human being to enumerate, let alone possess, the knowledge required to read an arbitrary piece of Python code. PEP 3131 will also cause problems for code review. Because many characters have indistinguishable appearances, there will be no mapping between what you see when you look at code and what the code actually says. So it will no longer be possible to look at a piece of Python code on your screen or on paper and be sure you know what it means, or even know that it is valid Python syntax. It will be much easier to write programs that look right but do the wrong thing, which is particularly bad if you are concerned with security. I like the idea that, after studying and working with Python for a modest amount of time, one can acquire a complete understanding of the language that affords confidence in the ability to read arbitrary programs written in Python, make changes to anything written in Python, and reuse any libraries or modules written in Python. (It is for the same reason that Python has a small and limited set of keywords that Python should have a small character set.) I don't like how PEP 3131 would not only take such abilities away from me, but remove them from the realm of possibility altogether. Of course, nothing stops one from creating a new language (say, "UniPython") that consists of Python with Unicode identifiers. One could even write a translator from UniPython to Python, thus making it straightforward to run UniPython programs. But it would be much better for this to be a separate language that no one is expected to fully understand, so that Python can remain a language that one *can* fully understand. -- ?!ng From santagada at gmail.com Wed May 23 04:56:23 2007 From: santagada at gmail.com (Leonardo Santagada) Date: Tue, 22 May 2007 23:56:23 -0300 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: Em 22/05/2007, ?s 19:08, Ka-Ping Yee escreveu: > > Currently Python has the property that the character set is a fully > known quantity. There currently exists a choice of keyboard, a choice > of editor, and a set of literacy skills that is sufficient for any > Python code in the world. > No, any python code can extend itself to infinity as python is a turing complete language, we probably don't even have a way to say if a python program ever stops, so saying that you now posses capabilities to understand every python program is being a little too confident on your hacking skillz :) > There will exist no keyboard that enables one to > edit any Python program, and probably no editor. Yes you can, using only a hex editor and simple cut and paste (or copying hex codes in paper and the re-typing then back) can edit any python code. > Today, if a non-English speaker asks you how to learn Python, you can > answer that question. You can explain Python's syntax and semantics, > and tell them they need to know the 26 letters of the Roman alphabet. Have you ever explained that to someone? "You need to know only the 26 letters of the alphabet, plus _+=-{}[]()_0123456789!@#%^*><,./?\" Really? And i probably missed a lot of stuff in there. The sintax rules continues to be as simple as ever, identifiers can contain whatever character the user knows (and all he doesn't know, but then he is not going to be able to read it anyway). > PEP 3131 will also cause problems for code review. Because many > characters have indistinguishable appearances, there will be no > mapping between what you see when you look at code and what the code > actually says. This was already discussed, if your font has the same symbol for different characters it is not a problem with python, but with the font. Then there is the different chars in unicode that are really suposed to be the same, then you need to know the context of the expression to know their meaning and then again this is not a python problem, maybe a unicode problem, I like to think this is a cultural problem, and we have to learn to live with it. > so that Python can remain a language that one *can* > fully understand. I know I'm being picky, but rethink for a little about this, probably you are afraid of this change, but really there is nothing to be afraid of. If some code is written in some language you don't understand or in an encoding that your editor don't know how to handle you will not be able to edit it. But then again, this is probably already true with the # _*_ encoding: parameter and with lots of crappy editors that don't even know how to handle utf-8. -- Leonardo Santagada santagada at gmail.com From python at zesty.ca Wed May 23 05:30:03 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Tue, 22 May 2007 22:30:03 -0500 (CDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: On Tue, 22 May 2007, Leonardo Santagada wrote: > > Today, if a non-English speaker asks you how to learn Python, you can > > answer that question. You can explain Python's syntax and semantics, > > and tell them they need to know the 26 letters of the Roman alphabet. > Have you ever explained that to someone? "You need to know only the > 26 letters of the alphabet, plus _+=-{}[]()_0123456789!@#%^*><,./?\" > Really? And i probably missed a lot of stuff in there. Except for those with disabilities, every Python programmer today can easily recognize, read, write, type, and speak every character in the syntax character set. Python fits your brain. Let's keep it that way. > > PEP 3131 will also cause problems for code review. Because many > > characters have indistinguishable appearances, there will be no > > mapping between what you see when you look at code and what the code > > actually says. > > This was already discussed, if your font has the same symbol for > different characters it is not a problem with python, but with the > font. Then there is the different chars in unicode that are really > suposed to be the same, then you need to know the context of the > expression to know their meaning and then again this is not a python > problem, maybe a unicode problem, I like to think this is a cultural > problem, and we have to learn to live with it. Assigning blame elsewhere will not make the problem go away. We do not incorporate buggy libraries into the Python core and then absolve ourselves by pointing fingers at the library authors; we should not incorporate the complicated and unsolved problems of international character sets into the language syntax definition, thereby turning them from problems with Unicode to problems with Python. -- ?!ng From showell30 at yahoo.com Wed May 23 05:25:45 2007 From: showell30 at yahoo.com (Steve Howell) Date: Tue, 22 May 2007 20:25:45 -0700 (PDT) Subject: [Python-3000] please keep open() as a builtin, and general concerns about Py3k complexity Message-ID: <601776.94139.qm@web33506.mail.mud.yahoo.com> Hi, this is my first post to the list. My name is Steve Howell, and I currently work on a system, largely written in Python, that processes a billion transactions per year. On the opposite side of the sprectrum, I've also had experience in classrooms using Python as a teaching tool. In the system I've worked on for the last three years, we have at least 200 calls to the builtin open() method. Ironically, to compile that stat, I wrote a tiny Python program that used open() as a builtin. So I'm -201 on the proposal to eliminate it as a builtin. I understand the original justification for the proposal--that it helps you identify modules that do I/O--but I don't find it difficult in practice to find modules that use I/O, and I definitely work with a large enough code base where that comes up. Although the open() debate seems to have died out, I'd like to reply to Raymond Hettinger's observation that "Taking a more global viewpoint, I'm experiencing a little FUD about Py3k." I think he's on to something. I've been following the Py3k discussions for several months, and I find myself frequently feeling very bewildered about the new features being proposed, even though I'm hardly a newbie. FWIW one of my favorite accepted PEPs is PEP 3111, "Simple input built-in in Python 3000." BTW it's the only 3000 series PEP with the word "simple" in the title. I realize looking at PEPs and mailing list archives can skew an outsider's view of how well Py3K simplifies the language, since simple ideas often don't require PEPs, and complex ideas often lead to lengthier debates than simple ones, but I'm not feeling the simplicity. ____________________________________________________________________________________ Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. http://autos.yahoo.com/new_cars.html From guido at python.org Wed May 23 06:20:31 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 22 May 2007 21:20:31 -0700 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: On 5/22/07, Ka-Ping Yee wrote: > Python fits your brain. Let's keep it that way. I'm sorry, Ping, but you sound just like I was feeling about the PEP at the start (and many others were too). You missed a bunch of enlightening posts from people with quite a different perspective. In particular very helpful was a couple of reports from the Java world, where Unicode letters in identifiers have been legal for a long time now. (JavaScript also supports this BTW.) The Java world has not fallen apart, but Java programmers in countries where English is not spoken regularly between programmers (e.g. Japan) find it very helpful to be able to communicate with each other through identifiers in their own language. Remember the mantra that *human* readability of code is important? Well, it helps if your code can use at least some the language spoken by those humans. Of course, even Japanese programmers must master *some* English -- the standard library and the language keywords are still in English, and they are okay with that. But the code they write for each other to read will be more readable *to them* if they don't have to resort to Latin transliterations of Japanese words. Because that's what they do today. And they don't like it. There code is already unreadable for us (for me, anyway :-) -- their comments are in Japanese (that's legal today) and so are their output messages (that's also legal today). My own personal example would be a program calculating Dutch income tax -- I'd be crazy trying to translate the Dutch tax-technical terms into English, and since the ideosyncracies of taxes are utterly localized, there would be no use for my program in other countries. Now Dutch can (for the most part, without much loss of readability) be written in ASCII, but the same idea of course applies to any application of local law, customs etc. Of course, for the standard library, there's a strict style rule requiring only ASCII in identifiers, and using English for names, comments and messages. A similar style guide is likely to be adopted by other global open source projects. But there are lots of regional open source projects too, and they can standardize on a different common language. Will there be occasional pain when someone writes a useful hack using their local language and finds they have to translate it to English in order to open source it? Sure. But the pain already exists if they chose to use their own language for comments, messages, or even identifiers (transliterated to the Latin alphabet). I don't expect there to be much additional pain. > > > PEP 3131 will also cause problems for code review. Because many > > > characters have indistinguishable appearances, there will be no > > > mapping between what you see when you look at code and what the code > > > actually says. I trust most programmers to *want* to write clear code, so they will steer clear from such things. If someone wants to obfuscate their code they already have plenty of opportunities (even in Python!). The problem is no worse than the lack of difference between 1 and l in some fonts, and between l and I in others (and there are even fonts where o and 0 look the same). > Assigning blame elsewhere will not make the problem go away. You may be misunderstanding the enthusiasm of your respondent. > We do > not incorporate buggy libraries into the Python core and then absolve > ourselves by pointing fingers at the library authors; we should not > incorporate the complicated and unsolved problems of international > character sets into the language syntax definition, thereby turning > them from problems with Unicode to problems with Python. Yes, Unicode has its problems (so does ASCII BTW). But they can be solved (see: Java and JavaScript). The Unicode standard also has some guidelines. Solutions are actively being discussed in this list. If you have any experience with other languages or fonts, please help. We should probably be conservative; I'm not too hopeful about support for right-to-left alphabets for example. But we can do better than ASCII (or Latin-1, which is much worse). Cheers, -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Wed May 23 06:43:17 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Tue, 22 May 2007 21:43:17 -0700 Subject: [Python-3000] [Python-Dev] Introduction and request for commit access to the sandbox. In-Reply-To: References: <465123A9.8090500@v.loewis.de> Message-ID: On 5/22/07, Alexandre Vassalotti wrote: > > As you see, cStringIO's code also needs a good cleanup to make it, > at least, conforms to PEP-7. Alexandre, It would be great if you could break up unrelated changes into separate patches. Some of these can go in sooner rather than later. I don't know all the things that need to be done, but I could imagine a separate patch for each of: * whitespace normalization * function name modification * other formatting changes * bug fixes * changes to make consistent with StringIO I don't know if all those items in the list need to change, but that's the general idea. Separate patches will make it much easier to review and get benefits from your work earlier. I look forward to seeing your work! n From stephen at xemacs.org Wed May 23 07:05:05 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 23 May 2007 14:05:05 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> Message-ID: <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> Jim Jewett writes: > On 5/22/07, "Martin v. L?wis" wrote: > > > That's why Java and C++ use \u, so you would write L\u00F6wis > > as an identifier. ... > > I think you are really arguing for \u escapes in identifiers here. > > Yes, that is effectively what I was suggesting. > > > *This* is truly unambiguous. I claim that it is also useless. > > It means users could see the usability benefits of PEP3131, but the > python internals could still work with ASCII only. But this reasoning is not coherent. Python internals will have no problems with non-ASCII; in fact, they would have no problems with tokens containing Cf characters or even reserved code points. Just give an unambiguous grammar for tokens composed of code points. It's only when a human enters the loop (ie, presentation of the identifier on an output stream) that they cause problems. It's *users* who are at risk, not the Python translator, and if there are any usability benefits to be taken advantage of by *presenting* identifiers that don't stick to ASCII, the risks of confusing or deliberately obfuscated code inhere in that very presentation. Not in the internals. For example: > It simplifies checking for identifiers that *don't* stick to ASCII, Only if you assume that people will actually perceive the 10-character string "L\u00F6wis" as an identifier, regardless of the fact that any programmable editor can be trained to display the 5-character string "L?wis" in a very small amount of code. Conversely, any programmable editor can easily be trained to take the internal representation "L?wis" and display it as "L\u00F6wis", giving all the benefits of the representation you propose. But who would ever enable it? (I suppose this is what Martin means by "useless".) > which reduces some of the concerns about confusable characters, and > which ones to allow. For the given reasons above, it reduces no concerns at all, except to the extent that it makes use of human-readable identifiers as Python identifiers inconvenient. I conclude that IMO PEP 3131 is precisely correct in scope as far as it goes. The only issues PEP 3131 should be concerned with *defining* are those that cause problems with canonicalization, and the range of characters and languages allowed in the standard library. I propose it would be useful to provide a standard mechanism for auditing the input stream. There would be one implementation for the stdlib that complains[1] about non-ASCII characters and possibly non-English words, and IMO that should be the default (for the reasons Ka-Ping gives for opposing the whole PEP). A second one should provide a very conservative Unicode set, with provision for amendment as experience shows restriction to be desirable or extension to be safe. A third, allowing any character that can be canonicalized into the form that PEP 3131 allows internally, is left as an exercise for the reader wild 'n' crazy enough to want to use it. For user convenience, it would be nice if these were implemented using the codec interface, although if applied to raw input there would need to be some duplication of parsing logic (specifically, comments and strings would have to be passed unchecked). I suppose it would be too expensive to use the codec interface at the point of interning an identifier (but maybe not, since it only needs to happen when adding an identifier to the symbol table; later occurrances would be short-circuited by probing the table and finding the token there). Footnotes: [1] I'm not sure what "complain" would mean in practice, since the PEP acknowledges use cases for both non-ASCII and non-English in the stdlib. From nnorwitz at gmail.com Wed May 23 07:13:46 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Tue, 22 May 2007 22:13:46 -0700 Subject: [Python-3000] please keep open() as a builtin, and general concerns about Py3k complexity In-Reply-To: <601776.94139.qm@web33506.mail.mud.yahoo.com> References: <601776.94139.qm@web33506.mail.mud.yahoo.com> Message-ID: On 5/22/07, Steve Howell wrote: > > In the system I've worked on for the last three years, > we have at least 200 calls to the builtin open() > method. This number is meaningless by itself. 200 calls in how many lines of code? How many files total and how many files use open? I'm not sure if the numbers are useful, but if it's only used in 0.1% of the modules, that's not a strong case for keeping it. > FWIW one of my favorite accepted PEPs is PEP 3111, > "Simple input built-in in Python 3000." BTW it's the > only 3000 series PEP with the word "simple" in the > title. :-) This PEP really just restores (raw_)input. So it mostly keeps the status quo. The name raw_input goes away, there is only input which is the same as raw_input() in 2.x. > I realize looking at PEPs and mailing list > archives can skew an outsider's view of how well Py3K > simplifies the language, since simple ideas often > don't require PEPs, and complex ideas often lead to > lengthier debates than simple ones, but I'm not > feeling the simplicity. Sure, that's understandable. For the most part, the PEPs are about adding new features, not about removing warts and cruft. PEP 3100 is the primary PEP which has info about removals. I'll pull some stats from the Misc/NEWS file which (hopefully) contains most of what's been done to date. At least 7 builtins have been removed. I expect at least 2-3 more will be removed completely. There will probably be ~5 others that are not used frequently which will be moved elsewhere (e.g., intern was already moved to sys). 1 package (compiler), 11 platform-independent modules, and probably ~20 platform-dependent modules have been removed. I'd expect another 5-10 platform-independent modules will be removed. Details from Misc/NEWS: Core: ------ - Absolute import is the default behavior for 'import foo' etc. - Removed support for syntax: backticks (ie, `x`), <> - Removed these Python builtins: apply(), callable(), coerce(), file() - Removed these Python methods: {}.has_key - Removed these opcodes: BINARY_DIVIDE, INPLACE_DIVIDE, UNARY_CONVERT - Remove C API support for restricted execution. Library ------- - Remove the imageop module. Obsolete long with its unit tests becoming useless from the removal of rgbimg and imgfile. - Removed these attributes from Python modules: * operator module: div, idiv, __div__, __idiv__, isCallable, sequenceIncludes - Remove the compiler package. Use of the _ast module and (an eventual) AST -> bytecode mechanism. - Removed these modules: * Bastion, bsddb185, exceptions, md5, popen2, rexec, sets, sha, stringold, strop, xmllib - Remove obsolete IRIX modules: al/AL, cd/CD, cddb, cdplayer, cl/CL, DEVICE, ERRNO, FILE, fl/FL, flp, fm, GET, gl/GL, GLWS, IN, imgfile, IOCTL, jpeg, panel, panelparser, readcd, sgi, sv/SV, torgb, WAIT. - Remove obsolete functions: * commands.getstatus(), os.popen*, - Remove functions in the string module that are also string methods. - Remove support for long obsolete platforms: plat-aix3, plat-irix5. - Remove xmlrpclib.SlowParser. It was based on xmllib. C API ----- - Removed these Python slots: __coerce__, __div__, __idiv__, __rdiv__ - Removed these C APIs: PyNumber_Coerce(), PyNumber_CoerceEx() - Removed these C slots/fields: nb_divide, nb_inplace_divide - Removed these macros: staticforward, statichere, PyArg_GetInt, PyArg_NoArgs - Removed these typedefs: intargfunc, intintargfunc, intobjargproc, intintobjargproc, getreadbufferproc, getwritebufferproc, getsegcountproc, getcharbufferproc I'm pretty sure there is a lot missing from this list of removals. I also know there will be more coming. :-) There will also be reorganizations that help reduce some conceptual overhead. So even though the standard library won't necessarily get smaller, it will be easier for new people to ignore sections they aren't interested in. For example, database modules or web libraries. n From showell30 at yahoo.com Wed May 23 07:45:15 2007 From: showell30 at yahoo.com (Steve Howell) Date: Tue, 22 May 2007 22:45:15 -0700 (PDT) Subject: [Python-3000] please keep open() as a builtin, and general concerns about Py3k complexity In-Reply-To: Message-ID: <250008.69531.qm@web33509.mail.mud.yahoo.com> --- Neal Norwitz wrote: > On 5/22/07, Steve Howell > wrote: > > > > In the system I've worked on for the last three > years, > > we have at least 200 calls to the builtin open() > > method. > > This number is meaningless by itself. 200 calls in > how many lines of code? > How many files total and how many files use open? > > I'm not sure if the numbers are useful, but if it's > only used in 0.1% > of the modules, that's not a strong case for keeping > it. > 17.7% of the files I searched have calls to open(). 980 source files 174 files call open() 242898 lines of code 305 calls to open() This is the quick and dirty Python code to compute these stats, which has a call to the open() builtin. import os fns = [] for dir in ('/ts-qa51', '/ars-qa12', '/is-qa7'): cmd = "cd %s && find . -name '*.py'" % dir output = os.popen(cmd).readlines() fns += [os.path.join(dir, line[2:]) for line in output] fns = [fn.strip() for fn in fns] numSourceFiles = len(fns) print '%d source files' % numSourceFiles loc = 0 filesWithBuiltin = 0 openLines = 0 for fn in fns: fn = fn.strip() lines = open(fn).readlines() loc += len(lines) hasBuiltin = False for line in lines: if ' open(' in line: hasBuiltin = True openLines += 1 if hasBuiltin: filesWithBuiltin += 1 print '%d files call open()' % filesWithBuiltin print '%d lines of code' % loc print '%d calls to open()' % openLines ____________________________________________________________________________________Get the Yahoo! toolbar and be alerted to new email wherever you're surfing. http://new.toolbar.yahoo.com/toolbar/features/mail/index.php From gproux+py3000 at gmail.com Wed May 23 07:48:03 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Wed, 23 May 2007 14:48:03 +0900 Subject: [Python-3000] please keep open() as a builtin, and general concerns about Py3k complexity In-Reply-To: <250008.69531.qm@web33509.mail.mud.yahoo.com> References: <250008.69531.qm@web33509.mail.mud.yahoo.com> Message-ID: <19dd68ba0705222248w69697069u704988fa6a333db0@mail.gmail.com> On 5/23/07, Steve Howell wrote: > 17.7% of the files I searched have calls to open(). My understand is that the mythical "python 2.x -> 3.0" tool will automatically migrate your code by using the AST to find all references to "open" and when finding one, add the correct import and replace the open by the io.open call Regards, Guillaume From showell30 at yahoo.com Wed May 23 08:01:43 2007 From: showell30 at yahoo.com (Steve Howell) Date: Tue, 22 May 2007 23:01:43 -0700 (PDT) Subject: [Python-3000] please keep open() as a builtin, and general concerns about Py3k complexity In-Reply-To: <19dd68ba0705222248w69697069u704988fa6a333db0@mail.gmail.com> Message-ID: <418292.50469.qm@web33510.mail.mud.yahoo.com> --- Guillaume Proux wrote: > On 5/23/07, Steve Howell > wrote: > > 17.7% of the files I searched have calls to > open(). > > My understand is that the mythical "python 2.x -> > 3.0" tool will > automatically migrate your code by using the AST to > find all > references to "open" and when finding one, add the > correct import and > replace the open by the io.open call > Agreed, but my concern isn't the conversion itself. I just want open() to stay as a builtin. In simple throwaway programs I appreciate the convenience, and in larger programs I appreciate not having to context-switch from the problem at hand to put an "import" at the top. But since you mentioned conversion, our system is a good example of a shop that will be running multiple versions of Python side by side for many years. We'll cut over new components to Py3k, and then we'll gradually upgrade legacy components. And, of course, some of those components will want to use the same common modules. ____________________________________________________________________________________Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 From nnorwitz at gmail.com Wed May 23 08:03:29 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Tue, 22 May 2007 23:03:29 -0700 Subject: [Python-3000] please keep open() as a builtin, and general concerns about Py3k complexity In-Reply-To: <19dd68ba0705222248w69697069u704988fa6a333db0@mail.gmail.com> References: <250008.69531.qm@web33509.mail.mud.yahoo.com> <19dd68ba0705222248w69697069u704988fa6a333db0@mail.gmail.com> Message-ID: On 5/22/07, Guillaume Proux wrote: > On 5/23/07, Steve Howell wrote: > > 17.7% of the files I searched have calls to open(). > > My understand is that the mythical "python 2.x -> 3.0" tool will > automatically migrate your code by using the AST to find all > references to "open" and when finding one, add the correct import and > replace the open by the io.open call Sure a fixer would be written if this change was made. I'm not sure from your comment about the tool being 'mythical' if you meant to imply that this wasn't real. Just in case there is any doubt, it is alive and well and lives in the sandbox: http://svn.python.org/projects/sandbox/trunk/2to3/ There are currently fixers for: apply, callable, dict, dummy, except, exec, has_key, input, intern, long, ne, next, nonzero, numliterals, print, raise, raw_input, repr, sysexcinfo, throw, tuple_params, unicode, ws_comma, xrange I'm not sure if this is the best list to handle questions about what does/doesn't exist for 3k. However, I don't know of a better place to discuss some of the transition issues. If there are doubts about what's being done, it would be great to raise them here and now, so we can dispel any myths that might exist. Other 3k status: * Most major changes have already been made * Biggest remaining change to the core language deals with string-unicode unification * ~10 accepted PEPs have yet to be implemented (some have patches) * 8 PEPs have not been accepted or rejected yet * Re-organization of the standard library is starting to move forward a little * Doc needs lots of work, only some changes have been made * First alpha optimistically will ship within ~3 months There are some issues with getting the alpha out within 3 months due to finishing the important tasks (ie, people's availability). So my guess is that the alpha will slip a little. str-uni needs to get done. We are running tests and building twice a day. There is a single failing test. Generally all the tests are working. n From nnorwitz at gmail.com Wed May 23 08:07:29 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Tue, 22 May 2007 23:07:29 -0700 Subject: [Python-3000] please keep open() as a builtin, and general concerns about Py3k complexity In-Reply-To: <418292.50469.qm@web33510.mail.mud.yahoo.com> References: <19dd68ba0705222248w69697069u704988fa6a333db0@mail.gmail.com> <418292.50469.qm@web33510.mail.mud.yahoo.com> Message-ID: On 5/22/07, Steve Howell wrote: > > But since you mentioned conversion, our system is a > good example of a shop that will be running multiple > versions of Python side by side for many years. We'll > cut over new components to Py3k, and then we'll > gradually upgrade legacy components. And, of course, > some of those components will want to use the same > common modules. Once we get a solid 3.0 (probably in beta), we will focus more energy on dealing with these sorts of problems. I can see there being a compatibility module that could fix things up to run with Python 2.x (*) - 3.0. I don't know if that will be distributed by the core or by a third party. There are many people that care about this issue. It's not being forgotten. We just haven't gotten to it yet. 2.6 and 3.0 are a year away, probably more. (*) probably between 2.2 and 2.4 depending on how hard it is to support. Pretty soon I'll start focusing on getting 2.6 in shape to help ease the transition. n From g.brandl at gmx.net Wed May 23 08:21:10 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 23 May 2007 08:21:10 +0200 Subject: [Python-3000] please keep open() as a builtin, and general concerns about Py3k complexity In-Reply-To: <418292.50469.qm@web33510.mail.mud.yahoo.com> References: <19dd68ba0705222248w69697069u704988fa6a333db0@mail.gmail.com> <418292.50469.qm@web33510.mail.mud.yahoo.com> Message-ID: Steve Howell schrieb: > --- Guillaume Proux wrote: > >> On 5/23/07, Steve Howell >> wrote: >> > 17.7% of the files I searched have calls to >> open(). >> >> My understand is that the mythical "python 2.x -> >> 3.0" tool will >> automatically migrate your code by using the AST to >> find all >> references to "open" and when finding one, add the >> correct import and >> replace the open by the io.open call >> > > Agreed, but my concern isn't the conversion itself. I > just want open() to stay as a builtin. In simple > throwaway programs I appreciate the convenience, and > in larger programs I appreciate not having to > context-switch from the problem at hand to put an > "import" at the top. ISTM that many modules using open() do also use os.path utilities to create the filename given to open(). In that case, you have an import statement in any case. Georg From showell30 at yahoo.com Wed May 23 08:36:15 2007 From: showell30 at yahoo.com (Steve Howell) Date: Tue, 22 May 2007 23:36:15 -0700 (PDT) Subject: [Python-3000] please keep open() as a builtin, and general concerns about Py3k complexity In-Reply-To: Message-ID: <168303.61404.qm@web33510.mail.mud.yahoo.com> --- Georg Brandl wrote: > ISTM that many modules using open() do also use > os.path > utilities to create the filename given to open(). In > that > case, you have an import statement in any case. > Not the case for us: 154 modules call only open() 11 modules call only os.path.join() 20 modules do call both But to your larger point, 80 out of the 174 modules that call open() do say "import os" for other reasons. ____________________________________________________________________________________Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when. http://tv.yahoo.com/collections/222 From stephen at xemacs.org Wed May 23 09:36:24 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 23 May 2007 16:36:24 +0900 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: <87lkfg56vb.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > We should probably be conservative; I'm not too hopeful about > support for right-to-left alphabets for example. I don't see what's for *Python* to *support*. My reasoning: bidi is entirely an issue of presentation; all Python should do is prohibit[1] direction markers in identifiers. To the extent that we don't know of editors that can consistently[2] present such identifiers as users would expect to see them, say bidi identifiers should be avoided as a "best current practice". AFAICS, PEP 3131 is going to work fine if we just delegate all the problems that have been brought up to the development environment in that way, except the important issues that Ka-Ping raises. IMHO the answer you gave is entirely satisfactory. Footnotes: [1] Or ignore, but I prefer prohibit because the bookkeeping involved in ensuring that introspective output produces the identifier that was read in from a file is unjustifiable overhead, and because permitting them opens the door to "stupid bidi tricks" by authors (we can't do anything about people who let their editors play stupid bidi tricks on them). [2] Ie, so that different identifiers always look different, and the same identifier is always presented in the same form. From python at zesty.ca Wed May 23 09:37:14 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Wed, 23 May 2007 02:37:14 -0500 (CDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: I can see that I don't stand a very high chance of convincing you. But I'd like to make sure you understand what I'm getting at, anyway. (And I will get to some specific suggestions at the end of this message.) The key thing is that the language definition is about to transition from something which has always "fit in your head", and which holds that property as a core value, to something which cannot possibly fit in anyone's head no matter how hard they try. (This core value of Python is not something I see as having been a core value of Java, and it's one of the reasons I like Python better.) > > PEP 3131 will also cause problems for code review. Because many > > characters have indistinguishable appearances, there will be no > > mapping between what you see when you look at code and what the code > > actually says. > > I trust most programmers to *want* to write clear code, so they will > steer clear from such things. If someone wants to obfuscate their code > they already have plenty of opportunities (even in Python!). Indeed -- but that's not an argument for creating more opportunities. For example, we like the fact that Python doesn't look like Perl; the mere fact that some kinds of obfuscation are possible in Python doesn't require us to give up on simplicity entirely and open the door to a Perl-like proliferation of operators. Not all programmers want to write clear code; from a security perspective, the most important programmers are the ones who have an incentive to fool you. Unicode identifiers are a new avenue for any insider who wants to use a Python program as a vector of attack; they enable changes that are harder to detect, track down, and understand. > The problem is no worse than the lack of difference between 1 and l in > some fonts, and between l and I in others (and there are even fonts > where o and 0 look the same). It's far, far worse. The number of ways in which characters can be confused in Unicode is much greater. There are many fonts you can choose from that offer a clear visual difference between 1 and l and I, whereas there are no fonts in the world that distinguish all the identifier characters in Unicode. More importantly, there probably never will be. It's not just incrementally harder to identify characters; Unicode intends to make it impossible by design. > Remember the mantra that *human* readability of code is > important? Well, it helps if your code can use at least some the > language spoken by those humans. Yes, a programming language is a communication medium among humans and computers. If you look at this as a communication medium, the problem is that we're losing round-trip ability to human-readable media. Suppose I hand you a printout of a Python program for you to review. One of the questions you are faced with answering is, "Is this a valid Python program?" But your answer will necessarily be "I don't know", for almost any program. "I cannot possibly know" will be the only truthful answer anyone can give. Or suppose you are reading a book about Python and it shows you a bit of code. You want to type in the example -- but you cannot be sure what you should type. I don't deny that there is some convenience to be gained by those who prefer to use other human languages when discussing and writing programs. But there is an extremely high cost to the language definition. With this definitional change, every Python program that is displayed on a screen or printed on paper (or, in fact, in any human-accessible representation) instantly becomes untrustworthy. Another way to look at it is the computer science definition of a language: what a language specifies is the set of acceptable programs. So the purpose of a language is to restrict: to define the boundary between what is in the language and what is not in the language. But that's just syntax; in addition, programming languages have semantics, so the other half of the purpose is to give programs meaning for the people who read them and construct compilers, interpreters, etc. If you put these two things together you get: The purpose of a programming language is to restrict the set of acceptable programs to a set that is small enough and simple enough that humans can agree on a clear meaning to each program. Maybe this will help you see why I am so concerned about PEP 3131 -- in my judgement, it violates the fundamental purpose of a programming language. The big difference between natural languages and programming languages is that it's okay for natural languages to be fuzzy, but programs need to have exactly one meaning because they're supposed to be operational. * * * Okay. I've said my arguments, and I hope they will convince you. But I recognize that they may not. And if so, I have a couple of suggestions for you to consider that might help address my concerns. First: the "Common Objections" section of the PEP is too thin. I'd like the following arguments to be mentioned there for the record: 1. Python will lose the ability to make a reliable round trip between a computer file and any human-accessible medium such as a visual display or a printed page. 2. Python will become vulnerable to a new class of security exploits via the writing of misleading or malicious code that is visually indistinguishable from correct code. Consequently it will be more difficult for humans to inspect code and assure its correctness or trustworthiness. There is very little established best practice for addressing homograph security issues. 3. The Python language will become too large for any single person to fully know, in the sense that no human being can know the full character set, and therefore no one can ever acquire the ability to independently examine a program and decide whether it is valid Python. 4. Python programs that reuse other Python modules may come to contain a mix of character sets such that no one can fully read them or properly display them. 5. Unicode is young and unfinished. As far as I know there are no truly complete Unicode fonts and there may not be for some time. Tool support is weak. The whole computer industry has 40 years of experience working with ASCII for everything, including programming languages; our experience with Unicode security issues and Unicode in programming languages is fairly immature. Second: we need a way to be sure about the programs we're running. So let the acceptance of Unicode identifiers be controlled by a command-line flag, e.g. "python -U" accepts them, "python" alone does not. And let's keep the code for this feature clearly separated so that one can be sure, with high confidence, that when this feature is turned off, none of the code for Unicode identifiers will be touched. It should be possible to compile a Python that is incapable of supporting Unicode identifiers. Then people who want to use non-ASCII identifiers can do so, and anyone can still run their programs if they want. At the same time, people who want to know exactly what their programs say can be confident that Python is working with a small and manageable character set. And people who don't know or don't care about this change won't suddenly have a whole new source of surprises thrust upon them; if they know enough to know they want this feature, they can ask for it. If we're going to introduce a significant new source of complexity, let's at least make it easy to keep things simple (and reliably simple) for those who want to do so; we can expect this to be the vast majority, given interoperability and extensibility concerns, existing industry practices, and the policy for the Python standard library. What do you think? -- ?!ng From jcarlson at uci.edu Wed May 23 10:29:57 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 May 2007 01:29:57 -0700 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: <20070523011101.85F0.JCARLSON@uci.edu> Ka-Ping Yee wrote: > I can see that I don't stand a very high chance of convincing you. > But I'd like to make sure you understand what I'm getting at, anyway. > (And I will get to some specific suggestions at the end of this > message.) > > The key thing is that the language definition is about to transition > from something which has always "fit in your head", and which holds > that property as a core value, to something which cannot possibly fit > in anyone's head no matter how hard they try. (This core value of > Python is not something I see as having been a core value of Java, > and it's one of the reasons I like Python better.) [snip] > If we're going to introduce a significant new source of complexity, > let's at least make it easy to keep things simple (and reliably > simple) for those who want to do so; we can expect this to be the vast > majority, given interoperability and extensibility concerns, existing > industry practices, and the policy for the Python standard library. > > > What do you think? For what it's worth, I've been wary of PEP 3131 for a while (if not outright against it). From identical character glyph issues (which have been discussed off and on for at least a year), to editing issues (being that I write and maintain a Python editor), to code sharing issues (and the ghettoization of code as Jim Jewett calls it), everything in between, and even things that we haven't thought of. Yes, PEP 3131 makes writing software in Python easier for some, but for others, it makes maintenance of 3rd party code a potential nightmare (regardless of 'community standards' to use ascii identifiers). - Josiah From ian.bollinger at gmail.com Wed May 23 12:03:43 2007 From: ian.bollinger at gmail.com (Ian D. Bollinger) Date: Wed, 23 May 2007 06:03:43 -0400 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: <4654117F.9020901@gmail.com> Ka-Ping Yee wrote: > 2. Python will become vulnerable to a new class of security > exploits via the writing of misleading or malicious code > that is visually indistinguishable from correct code. > Consequently it will be more difficult for humans to > inspect code and assure its correctness or trustworthiness. > There is very little established best practice for > addressing homograph security issues. > Isn't it already easy enough to do that today? >>> import base64; exec base64.decodestring('cHJpbnQgJ0hlbGxvLCB3b3JsZCEn\n') ... Hello, world! Admittedly, you could look for anything like that and be suspicious, but running a program from an untrusted source is always going to be dangerous. For standalone applications, you can already do things like compile malicious C extension modules that are impossible to verify. As for programs that use Python for scripting, shouldn't it be up to them to ensure that it runs in a restricted environment? A browser, for instance, would have to do that already. - Ian D. Bollinger From stephen at xemacs.org Wed May 23 13:07:57 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 23 May 2007 20:07:57 +0900 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <20070523011101.85F0.JCARLSON@uci.edu> References: <20070523011101.85F0.JCARLSON@uci.edu> Message-ID: <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> Josiah Carlson writes: > From identical character glyph issues (which have been discussed > off and on for at least a year), In my experience, this is not a show-stopping problem. Emacs/MULE has had it for 20 years because of the (horrible) design decision to attach charset information to each character in the representation of text. Thus, MULE distinguishes between NO-BREAK SPACE and NO-BREAK SPACE (the same!) depending on whether the containing text "is" ISO 8859-15 or "is" ISO 8859-1. (Semantically this is different from the identical glyph, different character problem, since according to ISO 8859 those characters are identical. However, as a practical matter, the problem of detecting and dealing with the situation is the same as in MULE the character codes are different.) How does Emacs deal with this? Simple. We provide facilities to identify identical characters (not relevant to PEP 3131, probably), to highlight suspicious characters (proposed, not actually implemented AFAIK, since identification does what almost all users want), and to provide information on characters in the editing buffer. The remaining problems with coding confusion are due to deficient implementation (mea maxima culpa). I consider this to be an editor/presentation problem, not a language definition issue. Note that Ka-Ping's worry about the infinite extensibility of Unicode relative to any human being's capacity is technically not a problem. You simply have your editor substitute machine-generated identifiers for each identifier that contains characters outside of the user's preferred set (eg, using hex codes to restrict to ASCII), then review the code. When you discover what an identifier's semantics are, you give it a mnemonic name according to the local style guide. Expensive, yes. But cost is a management problem, not the kind of conceptual problem Ka-Ping claims is presented by multilingual identifiers. Python is still, in this sense, a finitely generated language. > to editing issues (being that I write and maintain a Python editor) Multilingual editing (except for non-LTR scripts) is pretty much a solved problem, in theory, although adding it to any given implementation can be painful. However, since there are many programmer's editors that can handle multilingual text already, that is not a strong argument against PEP 3131. > Yes, PEP 3131 makes writing software in Python easier for some, but for > others, it makes maintenance of 3rd party code a potential nightmare > (regardless of 'community standards' to use ascii identifiers). Yes, there are lots of nightmares. In over 15 years of experience with multilingual identifiers, I can't recall any that have lasted past the break of dawn, though. I just don't see such identifiers very often, and when I do, they are never hard to deal with. Admittedly, I don't ever need to deal with Arabic or Devanagari or Thai, but I'd be willing to bet I could deal with identifiers in those languages, as long as the syntax is ASCII. As for third party code, "the doctor says that if you put down that hammer, your head will stop hurting". If multilingual third party code looks like a maintenance risk, don't deal with that third party.[1] Or budget for translation up front; translators are quite a bit cheaper than programmers. BTW, "find . -name '*.py' | xargs grep -l '[^[:ascii:]]'" is a pretty cheap litmus test for your software vendors! And yes, it *should* be looking into strings and comments. In practice (once I acquired a multilingual editor), handling non-English strings and comments has been 99% of the headache of maintaining code that contains non-ASCII. I've been maintaining the edict.el library, an interface to Jim Breen's Japanese-English dictionary EDICT for XEmacs for 10 years (there was serious development activity for only about the first 2, though). A large fraction of the identifiers specific to that library contain Japanese characters (both ideographic kanji and syllabic kana, as well as the pseudo-namespace prefix "edict-" in ASCII). There are several Japanese identifiers in there whose meaning I still don't know, except by referring to the code to see what it does (they're technical terms in Japanese linguistics, I believe, and probably about as intelligible to the layman as terms in Dutch tax law). At the time I started maintaining that library, I did so because I *couldn't read Japanese* (obviously!) This turned out to pose no problem. Japanese identifiers were *not* visually distinct to me, but when I needed to analyze a function, I became familiar with the glyphs of related identifiers quickly. And having an intelligible name to start with wouldn't have helped much; I needed to analyze the function because it wasn't doing what I wanted it to do, not because I couldn't translate the name. There are other packages in XEmacs which use non-ASCII, non-English identifiers, but they are rare. Maintaining them has never been reported as a problem. N.B. This is limited experience with what many might characterize as a niche language. And I'm an idiosyncratic individual, blessed with a reasonable amount of talent at language learning. Both valid points. However, I think the killer point in the above is the one about strings and comments. If you can discipline your team to write comments and strings in ASCII/English, extending that to identifiers is no problem. If your team insists on multilingual strings/comments, or needs them due to the task, multilingual identifiers will be the least of your problems, and the most susceptible to technical solution (eg, via identification and quarantine by cross-reference tables). Granted, this is going to be a more or less costly transition for ASCII-only Pythonistas. I think we should focus on cost-reduction, not on why it shouldn't happen. Footnotes: [1] Yes, I know, in the real world sometimes you have to. Multilingual identifiers are the least of your worries when dealing with a monopoly supplier. From python at zesty.ca Wed May 23 13:18:59 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Wed, 23 May 2007 06:18:59 -0500 (CDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <4654117F.9020901@gmail.com> References: <4654117F.9020901@gmail.com> Message-ID: On Wed, 23 May 2007, Ian D. Bollinger wrote: > Ka-Ping Yee wrote: > > 2. Python will become vulnerable to a new class of security > > exploits via the writing of misleading or malicious code > > that is visually indistinguishable from correct code. > > Consequently it will be more difficult for humans to > > inspect code and assure its correctness or trustworthiness. > > There is very little established best practice for > > addressing homograph security issues. > > > Isn't it already easy enough to do that today? There are two simultaneous errors in reasoning here. First, the fact that one can write confusing code today is not a reason to enable the writing of even more confusing code. Second, the Unicode identifier issue is different from the example you give here. In your example, it is obvious that the code is doing something hard to understand; if I showed you something like this and asked you what it did, you would think "hmm, that looks obfuscated": > >>> import base64; exec > base64.decodestring('cHJpbnQgJ0hlbGxvLCB3b3JsZCEn\n') > ... Hello, world! But with Unicode identifiers you have no way to know even whether you should be suspicious. You would feel confident that you know what a simple piece of code does, and yet be wrong. For example, this looks like a normal fragment of code: def remove_if_allowed(user, filename): allow = 1 for group in disabled_groups: if user in group: allow = 0 if allow: os.remove(filename) But there is no way to tell by looking at it whether it works or not. If all three occurrences of 'allow' are spelled with ASCII characters, it will work. If the second occurrence of 'allow' is spelled with a Cyrillic 'a' (U+0430), you have a silent security hole. Now imagine that this is part of an open-source project that accepts patches from the community, and senior developers check in the patches after reviewing them. The use of Unicode identifiers opens the door for someone to introduce a security hole that is guaranteed to be undetectable by reading the code, no matter how carefully anyone reads it. Will this be caught? Maybe someone will test the routine; maybe not. Either way, it is clear that the reviewer's job has just gotten much more difficult, and accepting patches is much more dangerous as a result of PEP 3131. -- ?!ng From alexandre at peadrop.com Wed May 23 16:01:11 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Wed, 23 May 2007 10:01:11 -0400 Subject: [Python-3000] [Python-Dev] Introduction and request for commit access to the sandbox. In-Reply-To: References: <465123A9.8090500@v.loewis.de> Message-ID: On 5/23/07, Neal Norwitz wrote: > On 5/22/07, Alexandre Vassalotti wrote: > > > > As you see, cStringIO's code also needs a good cleanup to make it, > > at least, conforms to PEP-7. > > Alexandre, > > It would be great if you could break up unrelated changes into > separate patches. Some of these can go in sooner rather than later. > I don't know all the things that need to be done, but I could imagine > a separate patch for each of: > > * whitespace normalization > * function name modification > * other formatting changes > * bug fixes > * changes to make consistent with StringIO > > I don't know if all those items in the list need to change, but that's > the general idea. Separate patches will make it much easier to review > and get benefits from your work earlier. I totally agree, and that was already my current idea. > I look forward to seeing your work! Thanks! -- Alexandre From jcarlson at uci.edu Wed May 23 18:23:28 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 May 2007 09:23:28 -0700 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20070523082241.85F3.JCARLSON@uci.edu> "Stephen J. Turnbull" wrote: > Josiah Carlson writes: > > > From identical character glyph issues (which have been discussed > > off and on for at least a year), > > In my experience, this is not a show-stopping problem. I never claimed that this, by itself, was a showstopper. And my post should not be seen as a "these are all the problems that I have seen with PEP 3131". Those are merely the issues that have been discussed over and over, for which I (and seemingly others) are still concerned with, regardless of the hundreds of posts here and in comp.lang.python seeking to convince us that "they are not a problem". > Emacs/MULE has > had it for 20 years because of the (horrible) design decision to > attach charset information to each character in the representation of > text. Thus, MULE distinguishes between NO-BREAK SPACE and NO-BREAK > SPACE (the same!) depending on whether the containing text "is" ISO > 8859-15 or "is" ISO 8859-1. (Semantically this is different from the > identical glyph, different character problem, since according to ISO > 8859 those characters are identical. However, as a practical matter, > the problem of detecting and dealing with the situation is the same as > in MULE the character codes are different.) > > How does Emacs deal with this? Simple. We provide facilities to > identify identical characters (not relevant to PEP 3131, probably), to > highlight suspicious characters (proposed, not actually implemented > AFAIK, since identification does what almost all users want), and to > provide information on characters in the editing buffer. The > remaining problems with coding confusion are due to deficient > implementation (mea maxima culpa). > > I consider this to be an editor/presentation problem, not a language > definition issue. This particular excuse pisses me off the most. "If you can't differentiate, then your font or editor sucks." Thank you for passing judgement on my choice of font or editor, but Ka-Ping already stated why this argument is bullshit: there does not currently exist a font where one *can* differentiate all the glyphs, and further, even if one could visually differentiate similar glyphs, *remembering* the 64,000+ glyphs that are available in just the primary unicode plane to differentiate them, is a herculean task. Never mind the fact that people use dozens, perhaps hundreds of different editors to write and maintain Python code, that the 'Emacs works' argument is poor at best. Heck, Thomas Bushnell made the same argument when I spoke with him 2 1/2 years ago (though he also included Vim as an alternative to Emacs); it smelled like bullshit then, and it smells like bullshit now. > Note that Ka-Ping's worry about the infinite extensibility of Unicode > relative to any human being's capacity is technically not a problem. > You simply have your editor substitute machine-generated identifiers > for each identifier that contains characters outside of the user's > preferred set (eg, using hex codes to restrict to ASCII), then review > the code. When you discover what an identifier's semantics are, you > give it a mnemonic name according to the local style guide. > Expensive, yes. But cost is a management problem, not the kind of > conceptual problem Ka-Ping claims is presented by multilingual > identifiers. Python is still, in this sense, a finitely generated > language. That's a bullshit argument, and you know it. "Just use hex escapes"? Modulo unicode comments and strings, all Python programs are easily read in default fonts available on every platform on the planet today. But with 3131, people accepting 3rd party code need to break 15+ years of "what you see is what is actually there" by verifying the character content of every identifier? That's a silly and unnecessary workload addition for anyone who wants to accept patches from 3rd parties, and relies on the same "your tools suck" argument to invalidate concerns over unicode glyph similarity. Speaking of which, do you know of a fixed-width font that is able to allow for the visual distinction of all unicode glyphs in the primary plane, or even the portion that Martin is proposing we support? This also "is not a show-stopper", but it certainly reduces audience satisfaction by a large margin. > > to editing issues (being that I write and maintain a Python editor) > > Multilingual editing (except for non-LTR scripts) is pretty much a > solved problem, in theory, although adding it to any given > implementation can be painful. However, since there are many > programmer's editors that can handle multilingual text already, that > is not a strong argument against PEP 3131. Another "your tools suck" argument. While my editor has been able to handle unicode content for a couple years now (supporting all encodings available to Python), every editor that wants to properly support the adding of unicode text in any locale will necessitate the creation of charmap-like interfaces in basically every editor. But really, I'm glad that Emacs works for you and has solved this problem for you. I honestly tried to use it 4 years ago, spent a couple weeks with it. But it didn't work for me, and I've spent the last 4 years writing an editor because it and the other 35 editors I tried at the time didn't work for me (as have the dozens of others for the exact same reason). But of course, our tools suck, and because we can't use Emacs, we are already placed in a 2nd tier ghettoized part of the Python community of "people with tools that suck". Thank you for hitting home that unless people use Emacs, their tools suck. I still don't believe that my concerns have been addressed. And I certainly don't believe that those Ka-Ping brought up (which are better than mine) have been addressed. But hey, my tools suck, so obviusly my concerns regarding using my tools to edit Python in the future don't matter. Thank you for the vote of confidence. - Josiah From jimjjewett at gmail.com Wed May 23 18:26:55 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 23 May 2007 12:26:55 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/23/07, Stephen J. Turnbull wrote: > Jim Jewett writes: > > It simplifies checking for identifiers that *don't* stick to ASCII, > Only if you assume that people will actually perceive the 10-character > string "L\u00F6wis" as an identifier, regardless of the fact that any > programmable editor can be trained to display the 5-character string > "L?wis" in a very small amount of code. Conversely, any programmable > editor can easily be trained to take the internal representation > "L?wis" and display it as "L\u00F6wis", giving all the benefits of the > representation you propose. But who would ever enable it? I would. I would like an alert (and possibly an import exception) on any code whose *executable portion* is not entirely in ASCII. Comments aren't a problem, unless they somehow erase or hide other characters or line breaks. Strings aren't a problem unless I evaluate them. Code ... I want to know if there is some non-ASCII. Even Latin-1 isn't much of a problem, except for single-quotes. I do want to know if 'abc' is a string or an identifier made with the "prime" letter. This might be an innocent cut-and-paste error (and how else would most people enter non-native characters), but it is still a problem -- and python would often create a new variable instead of warning me. > The only issues PEP 3131 should be concerned with *defining* > are those that cause problems with canonicalization, and the range of > characters and languages allowed in the standard library. Fair enough -- but the problem is that this isn't a solved issue yet; the unicode group themselves make several contradictory recommendations. I can come up with rules that are probably just about right, but I will make mistakes (just as the unicode consortium itself did, which is why they have both ID and XID, and why both have stability characters). Even having read their reports, my initial rules would still have banned mixed-script, which would have prevented your edict- example. So I'll agree that defining the charsets and combinations and canonicalization is the right scope; I just feel that best practice isn't yet clear enough. > I propose it would be useful to provide a standard mechanism for > auditing the input stream. There would be one implementation for the > stdlib that complains[1] about non-ASCII characters and possibly > non-English words, and IMO that should be the default (for the reasons > Ka-Ping gives for opposing the whole PEP). A second one should > provide a very conservative Unicode set, with provision for amendment > as experience shows restriction to be desirable or extension to be > safe. A third, allowing any character that can be canonicalized into > the form that PEP 3131 allows internally, is left as an exercise for > the reader wild 'n' crazy enough to want to use it. This might deal with my concerns. It is a bit more complicated than the current plans. -jJ From jimjjewett at gmail.com Wed May 23 18:39:43 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 23 May 2007 12:39:43 -0400 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: On 5/23/07, Ka-Ping Yee wrote: > First: the "Common Objections" section of the PEP is too thin. I'd > like the following arguments to be mentioned there for the record: > 4. Python programs that reuse other Python modules may come > to contain a mix of character sets such that no one can > fully read them or properly display them. 4.a Certain cut-and-paste errors (such as cutting from a word document that uses "smart quotes") will change from syntax errors to silently creating new identifiers. > 5. Unicode is young and unfinished. As far as I know there > are no truly complete Unicode fonts and there may not be > for some time. Tool support is weak. The whole computer > industry has 40 years of experience working with ASCII > for everything, including programming languages; our > experience with Unicode security issues and Unicode in > programming languages is fairly immature. 5.a Use of unicode for identifiers is not yet a resolved issue. The unicode consortium mostly recommends XID rather than the older ID; both sets already have "stability characters" and canonicalization concerns. It isn't quite clear which marks/letters/scripts to leave out. (The recommendations conflict; other than ASCII-only, I'm not sure I've found one yet that leaves out "letters" indistiguishable (even in the reference font) from already-meaningful syntax characters.) We can make up our own answers, but if we do that... maybe we shouldn't rush. -jJ From guido at python.org Wed May 23 18:45:46 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 23 May 2007 09:45:46 -0700 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: On 5/23/07, Jim Jewett wrote: > Certain cut-and-paste errors (such as cutting from a word document > that uses "smart quotes") will change from syntax errors to silently > creating new identifiers. Really? Are those quote characters considered letters by the Unicode standard? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From radix at twistedmatrix.com Wed May 23 18:45:56 2007 From: radix at twistedmatrix.com (Christopher Armstrong) Date: Wed, 23 May 2007 12:45:56 -0400 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: <60ed19d40705230945m558756day2863b81e38618747@mail.gmail.com> On 5/23/07, Jim Jewett wrote: > On 5/23/07, Ka-Ping Yee wrote: > > First: the "Common Objections" section of the PEP is too thin. I'd > > like the following arguments to be mentioned there for the record: > > > 4. Python programs that reuse other Python modules may come > > to contain a mix of character sets such that no one can > > fully read them or properly display them. > > 4.a > > Certain cut-and-paste errors (such as cutting from a word document > that uses "smart quotes") will change from syntax errors to silently > creating new identifiers. Is this actually true? Are the fancy quote characters really going to be in the set of characters that would valid in identifiers, as proposed? -- Christopher Armstrong International Man of Twistery http://radix.twistedmatrix.com/ http://twistedmatrix.com/ http://canonical.com/ From bwinton at latte.ca Wed May 23 18:52:25 2007 From: bwinton at latte.ca (Blake Winton) Date: Wed, 23 May 2007 12:52:25 -0400 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: <4654117F.9020901@gmail.com> Message-ID: <46547149.8050304@latte.ca> Ka-Ping Yee wrote: > But with Unicode identifiers you have no way to know even whether you > should be suspicious. You would feel confident that you know what > a simple piece of code does, and yet be wrong. Also, Jim Jewett wrote: > Strings aren't a problem unless I evaluate them. a = """This string has a triple quote and a command in it. \""" os.remove("*") """ If that \ is merely a unicode character that looks like \, you've just deleted your harddrive. (To close it off, you could use """, where the middle quote is a unicode character that looks like ".) Two strings, with some executable code in the middle, that looks like one harmless string. Actually, I think that could shorten down to: a = """ os.remove("*") """ with the middle character of each """ not being a ". My point here is that if you're confident that you know what a simple piece of code does, you're already wrong. Unicode identifiers don't change that. > But there is no way to tell by looking at it whether it works or not. > If all three occurrences of 'allow' are spelled with ASCII characters, > it will work. If the second occurrence of 'allow' is spelled with a > Cyrillic 'a' (U+0430), you have a silent security hole. If you search for "allow", it'll only match the ones that actually match. Yes, it makes patch reviewers jobs harder, or makes the tools they need to do their jobs need to be smarter. No, I don't think it's as bad as you think it is. And heck, if you're a patch reviewer, set the ASCII-only flag on your version of Python, or run a program before checking it in to flag non-ASCII characters, and reject all patches from that person in the future, since clearly they're a black hat. Also, I find strangely amusing that complaints about characters that look the same as other characters come from someone named "?!ng". :) Later, 314|<3. From jcarlson at uci.edu Wed May 23 20:21:53 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Wed, 23 May 2007 11:21:53 -0700 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20070523111704.85FC.JCARLSON@uci.edu> Removing those words that some found offensive, perhaps I will get a reponse to the point of my post: "your tools aren't very good" and "Emacs does it right" are not valid responses to the concerns brought up regarding unicode. "Stephen J. Turnbull" wrote: > Josiah Carlson writes: > > > From identical character glyph issues (which have been discussed > > off and on for at least a year), > > In my experience, this is not a show-stopping problem. I never claimed that this, by itself, was a showstopper. And my post should not be seen as a "these are all the problems that I have seen with PEP 3131". Those are merely the issues that have been discussed over and over, for which I (and seemingly others) are still concerned with, regardless of the hundreds of posts here and in comp.lang.python seeking to convince us that "they are not a problem". > Emacs/MULE has > had it for 20 years because of the (horrible) design decision to > attach charset information to each character in the representation of > text. Thus, MULE distinguishes between NO-BREAK SPACE and NO-BREAK > SPACE (the same!) depending on whether the containing text "is" ISO > 8859-15 or "is" ISO 8859-1. (Semantically this is different from the > identical glyph, different character problem, since according to ISO > 8859 those characters are identical. However, as a practical matter, > the problem of detecting and dealing with the situation is the same as > in MULE the character codes are different.) > > How does Emacs deal with this? Simple. We provide facilities to > identify identical characters (not relevant to PEP 3131, probably), to > highlight suspicious characters (proposed, not actually implemented > AFAIK, since identification does what almost all users want), and to > provide information on characters in the editing buffer. The > remaining problems with coding confusion are due to deficient > implementation (mea maxima culpa). > > I consider this to be an editor/presentation problem, not a language > definition issue. This particular excuse angers me the most. "If you can't differentiate, then your font or editor is garbage." Thank you for passing judgement on my choice of font or editor, but Ka-Ping already stated why this argument isn't valid: there does not currently exist a font where one *can* differentiate all the glyphs, and further, even if one could visually differentiate similar glyphs, *remembering* the 64,000+ glyphs that are available in just the primary unicode plane to differentiate them, is a herculean task. Never mind the fact that people use dozens, perhaps hundreds of different editors to write and maintain Python code, that the 'Emacs works' argument is poor at best. Heck, Thomas Bushnell made the same argument when I spoke with him 2 1/2 years ago (though he also included Vim as an alternative to Emacs); it smelled like garbage then, and it smells like garbage now. > Note that Ka-Ping's worry about the infinite extensibility of Unicode > relative to any human being's capacity is technically not a problem. > You simply have your editor substitute machine-generated identifiers > for each identifier that contains characters outside of the user's > preferred set (eg, using hex codes to restrict to ASCII), then review > the code. When you discover what an identifier's semantics are, you > give it a mnemonic name according to the local style guide. > Expensive, yes. But cost is a management problem, not the kind of > conceptual problem Ka-Ping claims is presented by multilingual > identifiers. Python is still, in this sense, a finitely generated > language. That's a poor argument, and you know it. "Just use hex escapes"? Modulo unicode comments and strings, all Python programs are easily read in default fonts available on every platform on the planet today. But with 3131, people accepting 3rd party code need to break 15+ years of "what you see is what is actually there" by verifying the character content of every identifier? That's a silly and unnecessary workload addition for anyone who wants to accept patches from 3rd parties, and relies on the same "your tools are poor" argument to invalidate concerns over unicode glyph similarity. Speaking of which, do you know of a fixed-width font that is able to allow for the visual distinction of all unicode glyphs in the primary plane, or even the portion that Martin is proposing we support? This also "is not a show-stopper", but it certainly reduces audience satisfaction by a large margin. > > to editing issues (being that I write and maintain a Python editor) > > Multilingual editing (except for non-LTR scripts) is pretty much a > solved problem, in theory, although adding it to any given > implementation can be painful. However, since there are many > programmer's editors that can handle multilingual text already, that > is not a strong argument against PEP 3131. Another "your tools aren't very good" argument. While my editor has been able to handle unicode content for a couple years now (supporting all encodings available to Python), every editor that wants to properly support the adding of unicode text in any locale will necessitate the creation of charmap-like interfaces in basically every editor. But really, I'm glad that Emacs works for you and has solved this problem for you. I honestly tried to use it 4 years ago, spent a couple weeks with it. But it didn't work for me, and I've spent the last 4 years writing an editor because it and the other 35 editors I tried at the time didn't work for me (as have the dozens of others for the exact same reason). But of course, our tools suck, and because we can't use Emacs, we are already placed in a 2nd tier ghettoized part of the Python community of "people with tools that aren't Emacs". Thank you for hitting home that unless people use Emacs, their tools arent sufficient for Python development. I still don't believe that my concerns have been addressed. And I certainly don't believe that those Ka-Ping brought up (which are better than mine) have been addressed. But hey, my tools aren't Emacs, so obviusly my concerns regarding using my tools to edit Python in the future don't matter. Thank you for the vote of confidence. - Josiah From ntoronto at cs.byu.edu Wed May 23 21:48:10 2007 From: ntoronto at cs.byu.edu (Neil Toronto) Date: Wed, 23 May 2007 13:48:10 -0600 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <20070523111704.85FC.JCARLSON@uci.edu> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> <20070523111704.85FC.JCARLSON@uci.edu> Message-ID: <46549A7A.6000807@cs.byu.edu> Josiah Carlson wrote: > Thank you for hitting home that unless people use Emacs, their tools > arent sufficient for Python development. I still don't believe that my > concerns have been addressed. And I certainly don't believe that those > Ka-Ping brought up (which are better than mine) have been addressed. > But hey, my tools aren't Emacs, so obviusly my concerns regarding using > my tools to edit Python in the future don't matter. Thank you for the > vote of confidence. > Though I don't develop an editor in my spare time, I had a similar reaction to the "Emacs does Unicode this way, which is correct" solutions. My favorite editor is going to have to get awfully smart. It reminds me of some friction I experienced when trying out Lisp. It's fairly painful to program in Lisp without an editor that does paren-matching and automatic indentation. I tried Emacs, and I didn't like it, which is a shame because it's the One True Editor for programming in Lisp. I basically dropped Lisp over this issue. In Lisp's case, the editor has to be smart because Lisp syntax is insufficient on its own to express program semantics *to a human*. (Every programming language has this problem to some extent, Lisp more than most because of all the parenthesis and general lack of visual cues, and Python much less than most because of smart use of operators and syntactically significant whitespace.) This is a user interface problem for a *language*, so it rubs me the wrong way to have to have it solved by an *editor*. Likewise, Unicode identifiers present numerous (detailed elsewhere) user interface problems. My general feeling is that language issues shouldn't be solved by editors. You should be able to comfortably change the semantics of a program with just about any text editor. Otherwise, we have a situation where some editors are blessed for use with the language and most are not, and if a would-be programmer's favorite isn't on the list, he leaves. Neil From python at zesty.ca Wed May 23 23:11:00 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Wed, 23 May 2007 16:11:00 -0500 (CDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: On Wed, 23 May 2007, Guido van Rossum wrote: > On 5/23/07, Jim Jewett wrote: > > Certain cut-and-paste errors (such as cutting from a word document > > that uses "smart quotes") will change from syntax errors to silently > > creating new identifiers. > > Really? Are those quote characters considered letters by the Unicode standard? According to the table at http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-331.html , the following quote-like characters are not identifier characters: U+2018 LEFT SINGLE QUOTATION MARK U+2019 RIGHT SINGLE QUOTATION MARK U+201C LEFT DOUBLE QUOTATION MARK U+201D RIGHT DOUBLE QUOTATION MARK I believe these four are the "smart quotes" produced by Word. But the following are identifier characters: U+02BB MODIFIER LETTER TURNED COMMA (same glyph as U+2018) U+02BC MODIFIER LETTER APOSTROPHE (same glyph as U+2019) U+02EE MODIFIER LETTER DOUBLE APOSTROPHE (same glyph as U+201D) U+0312 COMBINING TURNED COMMA ABOVE (same glyph as U+2018) U+0313 COMBINING COMMA ABOVE (same glyph as U+2019) U+0315 COMBINING COMMA ABOVE RIGHT (same glyph as U+2019) So there are three sets of characters that look the same: U+02BB = U+0312 = U+2018 U+02BC = U+0313 = U+0315 = U+2019 U+02EE = U+201D U+0312, U+0313, and U+0315 are combining characters that cause the comma to appear over the preceding letter, and they are not allowed to appear as the first character in an identifier. So, if your editor displays combining characters as properly combined, they will not be confusable with quotation marks; otherwise, they could be. -- ?!ng From jimjjewett at gmail.com Wed May 23 23:25:00 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 23 May 2007 17:25:00 -0400 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: On 5/23/07, Guido van Rossum wrote: > On 5/23/07, Jim Jewett wrote: > > Certain cut-and-paste errors (such as cutting from a word document > > that uses "smart quotes") will change from syntax errors to silently > > creating new identifiers. > Really? Are those quote characters considered letters by the Unicode standard? I'm not certain which specific character MS Word uses for smart quotes. My best guess is that it is actually "PRIVATE USE 1", which is supposed to be ignored (don't prevent it; just pretend it isn't there). My fears were heightened by http://www.unicode.org/reports/tr31/tr31-8.html. They discuss NFKC canonicalization (though another tech report recommends NFKD. If you use NFKC, they say to modify it so that because U+0374 ( ? ) GREEK NUMERAL SIGN should not be allowed, but it folds to U+02B9 ( ? ) MODIFIER LETTER PRIME, which they claim should be allowed. Within the codepoints < 256, if we ban rather than ignore, the only remaining problems are likely to be (1) that we must add _ as an allowed ID start, and (2) we must decide whether or not to allow the recommended 00AA ; ID_Start # L& FEMININE ORDINAL INDICATOR 00B5 ; ID_Start # L& MICRO SIGN 00BA ; ID_Start # L& MASCULINE ORDINAL INDICATOR (also in XID_START, and in the CONTINUE sets) -jJ From jason.orendorff at gmail.com Wed May 23 23:19:57 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Wed, 23 May 2007 17:19:57 -0400 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <46549A7A.6000807@cs.byu.edu> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> <20070523111704.85FC.JCARLSON@uci.edu> <46549A7A.6000807@cs.byu.edu> Message-ID: This discussion is off the rails again. I'm at least sympathetic to the spoofing argument, because theoretical security concerns have a way of becoming serious practical concerns overnight. But I'm not sure what to make of the rest. Other languages have had this feature for many years. The "numerous user interface problems" do not seem to arise in practice. -j From python at zesty.ca Thu May 24 00:02:01 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Wed, 23 May 2007 17:02:01 -0500 (CDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: On Wed, 23 May 2007, Ka-Ping Yee wrote: > So there are three sets of characters that look the same: > > U+02BB = U+0312 = U+2018 > U+02BC = U+0313 = U+0315 = U+2019 > U+02EE = U+201D The Greek combining koronis, U+0343, is an allowed identifier character and also looks identical to a single right quote, U+02BC = U+0313 = U+0315 = U+0343 = U+2019. > U+0312, U+0313, and U+0315 are combining characters that cause the > comma to appear over the preceding letter, and they are not allowed > to appear as the first character in an identifier. So, if your > editor displays combining characters as properly combined, they will > not be confusable with quotation marks; otherwise, they could be. I just realized that this is not the whole story. There's no requirement that a combining character has to actually come after a character it can be combined with. So there might be valid identifiers containing sequences of characters that don't have a sensible rendering, or that force the combining comma to appear separately and thus indistinguishable from a quotation mark even in a Unicode-aware editor. -- ?!ng From python at zesty.ca Thu May 24 00:35:52 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Wed, 23 May 2007 17:35:52 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, 23 May 2007, Stephen J. Turnbull wrote: > > It means users could see the usability benefits of PEP3131, but the > > python internals could still work with ASCII only. > > But this reasoning is not coherent. Python internals will have no > problems with non-ASCII; in fact, they would have no problems with > tokens containing Cf characters or even reserved code points. Just > give an unambiguous grammar for tokens composed of code points. It's > only when a human enters the loop (ie, presentation of the identifier > on an output stream) that they cause problems. You've got this backwards, and I suspect that's part of the root of the disagreement. It's not that "when humans enter the loop they cause problems." The purpose of the language is to *serve humans*. Without humans, we would just use machine code instead of Python. If it doesn't work for humans, it's not because the humans are broken, the language is broken. The grammar has to be something a human can understand. (And if 90%, or more than 50%, of the tools are "broken" with respect to the language, that's a language problem, not just a tool problem.) > I propose it would be useful to provide a standard mechanism for > auditing the input stream. There would be one implementation for the > stdlib that complains[1] about non-ASCII characters and possibly > non-English words, and IMO that should be the default This should be built in to the Python interpreter and on by default, unless it is turned off by a command-line switch that says "I want to allow the full set of Unicode identifier characters in identifiers." > A second one should provide a very conservative Unicode set, with > provision for amendment as experience shows restriction to be > desirable or extension to be safe. If we are going to allow Unicode identifiers at all, then I would recommend only allowing identifiers that are already normalized (in NFC). If this recommendation is rejected, then I propose that the second-level mode that Stephen suggests here only allow normalized identifiers. In summary, my preference ordering of the possibilities would be: 1. Identifiers remain ASCII-only. 2. "python" allows only ASCII identifiers. "python -U" allows Unicode identifiers that are in NFC and use a conservative, *fixed* subset of the available characters. Support for "-U" is a compile-time option, preferably not compiled into official binary releases of Python. 3. "python" and "python -U" are as above. "python -UU" allows all Unicode identifier characters (which may grow over time as the Unicode standard changes). Support for "-UU" is a compile-time option, never on in official binary releases of Python, and discouraged with "here be dragons" warnings, etc. The ideas that I'm in favour of include: (a) Require identifiers to be in ASCII. (b) Require a compile-time option to enable non-ASCII identifiers. (c) Require a command-line flag to enable non-ASCII identifiers. (d) Require identifiers to be in NFC. (e) Use a character set that is fixed over time. -- ?!ng From showell30 at yahoo.com Thu May 24 02:57:44 2007 From: showell30 at yahoo.com (Steve Howell) Date: Wed, 23 May 2007 17:57:44 -0700 (PDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <46549A7A.6000807@cs.byu.edu> Message-ID: <906645.13234.qm@web33503.mail.mud.yahoo.com> --- Neil Toronto wrote: > Josiah Carlson wrote: > > Thank you for hitting home that unless people use > Emacs, their tools > > arent sufficient for Python development. I still > don't believe that my > > concerns have been addressed. And I certainly > don't believe that those > > Ka-Ping brought up (which are better than mine) > have been addressed. > > But hey, my tools aren't Emacs, so obviusly my > concerns regarding using > > my tools to edit Python in the future don't > matter. Thank you for the > > vote of confidence. > > > > Though I don't develop an editor in my spare time, I > had a similar > reaction to the "Emacs does Unicode this way, which > is correct" > solutions. My favorite editor is going to have to > get awfully smart. > > It reminds me of some friction I experienced when > trying out Lisp. It's > fairly painful to program in Lisp without an editor > that does > paren-matching and automatic indentation. I tried > Emacs, and I didn't > like it, which is a shame because it's the One True > Editor for > programming in Lisp. I basically dropped Lisp over > this issue. [...] I'm +1 on being able to use Py3k effectively with a relatively dumb editor. (I now use vim, which sucks a lot less than vi, but doesn't compare to some pretty damn awesome Windows editors that I used in my pre-Python days). Still, I'm plus +1 on PEP 3131. It will benefit me in no way whatsoever, as I'm a native English speaker, I'm of English descent, I work with people with code most effectively in English (even though it's often their second language), I18N doesn't fit my brain, I like English muffins, etc. The thing that's compelling to me about PEP 3131 is that it truly opens up Python to a new audience. I just hope I never have to write an app that parses Dutch tax laws. ____________________________________________________________________________________Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. http://smallbusiness.yahoo.com/webhosting From showell30 at yahoo.com Thu May 24 03:56:56 2007 From: showell30 at yahoo.com (Steve Howell) Date: Wed, 23 May 2007 18:56:56 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: Message-ID: <257359.78365.qm@web33507.mail.mud.yahoo.com> --- Ka-Ping Yee wrote: > In summary, my preference ordering of the > possibilities would be: > > [...] > > 2. "python" allows only ASCII identifiers. > "python -U" allows > Unicode identifiers that are in NFC and use > a conservative, > *fixed* subset of the available characters. > Support for > "-U" is a compile-time option, preferably > not compiled into > official binary releases of Python. > > 3. "python" and "python -U" are as above. > "python -UU" allows > all Unicode identifier characters (which may > grow over time > as the Unicode standard changes). Support > for "-UU" is a > compile-time option, never on in official > binary releases of > Python, and discouraged with "here be > dragons" warnings, etc. > I'm in favor of that, with the idea that by 3.1 or 3.later (depending on feeback from international community), Python would eventually deprecate those options, and it would eventually be the burden of non-Unicoders (which includes me) to specify --asciionly if they were worried about running non-ASCII Python. I disagree with option 1 (not quoted), but not passionately. ____________________________________________________________________________________Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545433 From gproux+py3000 at gmail.com Thu May 24 04:14:15 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Thu, 24 May 2007 11:14:15 +0900 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <906645.13234.qm@web33503.mail.mud.yahoo.com> References: <46549A7A.6000807@cs.byu.edu> <906645.13234.qm@web33503.mail.mud.yahoo.com> Message-ID: <19dd68ba0705231914x491174ffha533763799e17a81@mail.gmail.com> Regarding using looking-alike glyphs (in certain fonts) security issues, wouldn't it be a good thing for any project anyway to have a number of pre-conditions for any given contribution to a given project to be cleared. On of such litmus tests would be like the following. try: codecs.open("contributedfile.py","r","ascii") print("contribution accepted") except UnicodeDecodeError: print("contribution rejected. evil non-ascii characters lurking in your source. ") (it should be possible (and this is left as exercise to the reader) to use some regexp to first remove from the scope of the test strings and comments or to use AST tools to make the tests directly on the generated AST) In Japan, replace the above "ascii" by "sjis" for example. it should be fairly easy to write a number of tools that would highlight "strange" characters in a piece of source code and I trust that if there is such a need, the market for python specialized editors (and other generic editors) will let you pick a different color for characters that would not be part of the ascii set. Once again, mostly a presentation and workflow issue that can be solved by using the right tools or writing some very simple tools to work around your favorite editor's lacks. Regards, Guillaume From showell30 at yahoo.com Thu May 24 04:26:03 2007 From: showell30 at yahoo.com (Steve Howell) Date: Wed, 23 May 2007 19:26:03 -0700 (PDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <19dd68ba0705231914x491174ffha533763799e17a81@mail.gmail.com> Message-ID: <707787.71027.qm@web33510.mail.mud.yahoo.com> --- Guillaume Proux wrote: > Regarding using looking-alike glyphs (in certain > fonts) security > issues, wouldn't it be a good thing for any project > anyway to have a > number of pre-conditions for any given contribution > to a given project > to be cleared. On of such litmus tests would be like > the following. > try: > codecs.open("contributedfile.py","r","ascii") > print("contribution accepted") > except UnicodeDecodeError: > print("contribution rejected. evil non-ascii > characters lurking > in your source. ") > Yep. Pychecker and automated unit tests could also protect against bugs or holes caused by bad encodings or typos (whether malicious or accidental). ____________________________________________________________________________________Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV. http://tv.yahoo.com/ From guido at python.org Thu May 24 04:26:31 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 23 May 2007 19:26:31 -0700 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <19dd68ba0705231914x491174ffha533763799e17a81@mail.gmail.com> References: <46549A7A.6000807@cs.byu.edu> <906645.13234.qm@web33503.mail.mud.yahoo.com> <19dd68ba0705231914x491174ffha533763799e17a81@mail.gmail.com> Message-ID: The tokenize module could easily be used to do such tests, as lenient or as strict as required by any particular style guide. On 5/23/07, Guillaume Proux wrote: > Regarding using looking-alike glyphs (in certain fonts) security > issues, wouldn't it be a good thing for any project anyway to have a > number of pre-conditions for any given contribution to a given project > to be cleared. On of such litmus tests would be like the following. > try: > codecs.open("contributedfile.py","r","ascii") > print("contribution accepted") > except UnicodeDecodeError: > print("contribution rejected. evil non-ascii characters lurking > in your source. ") > > (it should be possible (and this is left as exercise to the reader) to > use some regexp to first remove from the scope of the test strings and > comments or to use AST tools to make the tests directly on the > generated AST) > > In Japan, replace the above "ascii" by "sjis" for example. > > it should be fairly easy to write a number of tools that would > highlight "strange" characters in a piece of source code and I trust > that if there is such a need, the market for python specialized > editors (and other generic editors) will let you pick a different > color for characters that would not be part of the ascii set. Once > again, mostly a presentation and workflow issue that can be solved by > using the right tools or writing some very simple tools to work around > your favorite editor's lacks. > > Regards, > > Guillaume > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Thu May 24 07:12:54 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 24 May 2007 07:12:54 +0200 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <20070523082241.85F3.JCARLSON@uci.edu> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> <20070523082241.85F3.JCARLSON@uci.edu> Message-ID: <46551ED6.5070900@v.loewis.de> > This particular excuse pisses me off the most. "If you can't > differentiate, then your font or editor sucks." Thank you for passing > judgement on my choice of font or editor, but Ka-Ping already stated > why this argument is bullshit: there does not currently exist a font > where one *can* differentiate all the glyphs That's not true. In the Unicode BMP fallback font, you can easily differentiate all Unicode characters (in the BMP): http://scripts.sil.org/UnicodeBMPFallbackFont > Speaking of which, do you know of a fixed-width font that is able to > allow for the visual distinction of all unicode glyphs in the primary > plane, or even the portion that Martin is proposing we support? This > also "is not a show-stopper", but it certainly reduces audience > satisfaction by a large margin. See above. Regards, Martin From martin at v.loewis.de Thu May 24 07:25:10 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 24 May 2007 07:25:10 +0200 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: Message-ID: <465521B6.1050601@v.loewis.de> > I just realized that this is not the whole story. There's no > requirement that a combining character has to actually come > after a character it can be combined with. So there might be > valid identifiers containing sequences of characters that don't > have a sensible rendering, or that force the combining comma to > appear separately and thus indistinguishable from a quotation > mark even in a Unicode-aware editor. That can't happen. In Unicode, there is no notion of "can be combined with": any base character can be combined with any combining character. The rendering engine is supposed to create a glyph on the fly. Regards, Martin From martin at v.loewis.de Thu May 24 07:38:48 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 24 May 2007 07:38:48 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <465524E8.4000008@v.loewis.de> > You've got this backwards, and I suspect that's part of the root of > the disagreement. It's not that "when humans enter the loop they > cause problems." The purpose of the language is to *serve humans*. > Without humans, we would just use machine code instead of Python. > If it doesn't work for humans, it's not because the humans are broken, > the language is broken. > > The grammar has to be something a human can understand. Indeed, it is easy for a human to still understand the Py3k grammar. An identifier starts with a letter, followed by letters and digits. It's really the same rule that was in use all the time. It's not easy for a single human to memorize the entire *language*, and never was. The language is not just about the syntax: it's also about the library. While there are many details of the library that you can memorize, I bet nobody could enumerate all classes, functions, methods, symbolic constants etc in the entire library; this causes no concern for people. > If we are going to allow Unicode identifiers at all, then I would > recommend only allowing identifiers that are already normalized > (in NFC). In what way would that be an improvement compared to what the PEP already says? > 2. "python" allows only ASCII identifiers. "python -U" allows > Unicode identifiers that are in NFC and use a conservative, > *fixed* subset of the available characters. Support for > "-U" is a compile-time option, preferably not compiled into > official binary releases of Python. > > 3. "python" and "python -U" are as above. "python -UU" allows > all Unicode identifier characters (which may grow over time > as the Unicode standard changes). Support for "-UU" is a > compile-time option, never on in official binary releases of > Python, and discouraged with "here be dragons" warnings, etc. This would cripple the feature, so I'm -1. Regards, Martin From jcarlson at uci.edu Thu May 24 09:05:39 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 24 May 2007 00:05:39 -0700 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <46551ED6.5070900@v.loewis.de> References: <20070523082241.85F3.JCARLSON@uci.edu> <46551ED6.5070900@v.loewis.de> Message-ID: <20070524000016.862B.JCARLSON@uci.edu> "Martin v. L?wis" wrote: > > This particular excuse pisses me off the most. "If you can't > > differentiate, then your font or editor sucks." Thank you for passing > > judgement on my choice of font or editor, but Ka-Ping already stated > > why this argument is bullshit: there does not currently exist a font > > where one *can* differentiate all the glyphs > > That's not true. In the Unicode BMP fallback font, you can easily > differentiate all Unicode characters (in the BMP): > > http://scripts.sil.org/UnicodeBMPFallbackFont That's a cute hack that offers a method of applying the "just use hex" argument to any editor with multi-font support, but it certainly isn't usable for actual work. - Josiah From stephen at xemacs.org Thu May 24 09:19:42 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 24 May 2007 16:19:42 +0900 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <20070523082241.85F3.JCARLSON@uci.edu> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> <20070523082241.85F3.JCARLSON@uci.edu> Message-ID: <87bqga6641.fsf@uwakimon.sk.tsukuba.ac.jp> Josiah Carlson writes: > Thank you for hitting home that unless people use Emacs, their tools > suck. I'm sorry you took it that way. My experience is limited to Emacs; that's the only experience I can describe. If you can tell the story of a maintainer of a package that contains multilingual identifiers, and experienced a horror story, I'd like to hear it, and I sure hope you tell Guido about it. I'll deal with the technical content of your reply elsewhere. Sincerely yours, Steve From showell30 at yahoo.com Thu May 24 09:10:51 2007 From: showell30 at yahoo.com (Steve Howell) Date: Thu, 24 May 2007 00:10:51 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <465524E8.4000008@v.loewis.de> Message-ID: <548742.28521.qm@web33503.mail.mud.yahoo.com> --- "Martin v. L?wis" wrote: > [...] > > 2. "python" allows only ASCII identifiers. > "python -U" allows > > Unicode identifiers that are in NFC and > use a conservative, > > *fixed* subset of the available > characters. Support for > > "-U" is a compile-time option, preferably > not compiled into > > official binary releases of Python. > > > > 3. "python" and "python -U" are as above. > "python -UU" allows > > all Unicode identifier characters (which > may grow over time > > as the Unicode standard changes). Support > for "-UU" is a > > compile-time option, never on in official > binary releases of > > Python, and discouraged with "here be > dragons" warnings, etc. > > This would cripple the feature, so I'm -1. > FWIW the Ruby interpreter (1.8.5) seems to require this flag to allow you to turn on the Japanese code set. -Kkcode specifies KANJI (Japanese) code-set I have no idea whether or not this cripples the feature in Ruby, and perhaps it's an apples/oranges comparison. ____________________________________________________________________________________Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase. http://farechase.yahoo.com/ From martin at v.loewis.de Thu May 24 09:11:27 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 24 May 2007 09:11:27 +0200 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <20070524000016.862B.JCARLSON@uci.edu> References: <20070523082241.85F3.JCARLSON@uci.edu> <46551ED6.5070900@v.loewis.de> <20070524000016.862B.JCARLSON@uci.edu> Message-ID: <46553A9F.9010307@v.loewis.de> >>> This particular excuse pisses me off the most. "If you can't >>> differentiate, then your font or editor sucks." Thank you for passing >>> judgement on my choice of font or editor, but Ka-Ping already stated >>> why this argument is bullshit: there does not currently exist a font >>> where one *can* differentiate all the glyphs >> That's not true. In the Unicode BMP fallback font, you can easily >> differentiate all Unicode characters (in the BMP): >> >> http://scripts.sil.org/UnicodeBMPFallbackFont > > That's a cute hack that offers a method of applying the "just use hex" > argument to any editor with multi-font support, but it certainly isn't > usable for actual work. Depends on what you want to achieve. If your objective is "I want to visually recognize whether there are any stray characters in the file, outside the range of characters which I normally use", then such a kind of font can work very well. This one (or one similar to it) is installed (by default?) on Debian Linux, and it helps to recognize cases where you have characters in a text that you could not display otherwise. In any case, I still think it proves the argument wrong: "there does not currently exist a font where one *can* differentiate all the glyphs". Regards, Martin From greg.ewing at canterbury.ac.nz Thu May 24 10:28:40 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 24 May 2007 20:28:40 +1200 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <46551ED6.5070900@v.loewis.de> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> <20070523082241.85F3.JCARLSON@uci.edu> <46551ED6.5070900@v.loewis.de> Message-ID: <46554CB8.7050209@canterbury.ac.nz> Martin v. L?wis wrote: > That's not true. In the Unicode BMP fallback font, you can easily > differentiate all Unicode characters (in the BMP): > > http://scripts.sil.org/UnicodeBMPFallbackFont Er... somehow I don't think that's what Martin had in mind when he used the word "font" in that context. :-) -- Greg From stephen at xemacs.org Thu May 24 12:05:24 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 24 May 2007 19:05:24 +0900 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <20070523111704.85FC.JCARLSON@uci.edu> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> <20070523111704.85FC.JCARLSON@uci.edu> Message-ID: <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> Josiah Carlson writes: > Removing those words that some found offensive, perhaps I will get a > reponse to the point of my post: "your tools aren't very good" and > "Emacs does it right" are not valid responses to the concerns brought up > regarding unicode. You're missing my point still, and I don't find the words offensive. (It's a pain in the neck, since I already wrote my reply, but I'll remove them too.) Nor do I find your completely groundless conclusion that I'm deprecating other tools offensive. I find them to be an indicator of your fears which cannot be grounded in any experience of mine---in exactly the kind of environment PEP 3131 will provide. I strongly suspect you have no experience at all, not even hearsay, to offer. *Please* prove me wrong! My experience is *far* from definitive. But if you can't, well, I don't blame you for your fear, but I also cannot take it seriously as a reason to not implement this PEP in the face of my own long experience. > but Ka-Ping already stated why this argument is invalid: there > does not currently exist a font where one *can* differentiate all > the glyphs, I'll tell you why Ka-Ping's argument is a strawman. First, one only *needs* to be able to distinguish those characters that one can read. It's nice to be able to admire the rest, of course, but you don't need to see them as a speaker of that language would. You just use a font you like for the characters you can read, and the rest can be any old dog. Second, you do *not* need a single font with universal coverage. I typically use different fonts for Roman, Kanji, half-width kana, and Hangul. If I happen to have some Chinese in there, that will be yet another font. If I had cause to use Arabic, Hebrew, or Thai, they would be yet other fonts. It simply is not at all unpleasant to use LucidaTypwriter for ASCII and Latin-1 in the same buffer with Sazanami Gothic for Japanese. N.B. Martin is correct to point out the existing of the SIL BMP fallback font, but that doesn't answer the real issue, that users should use the fonts (and tools) they like best. > and further, even if one could visually differentiate similar I have actually worked in an environment where you can't visually distinguish different characters. Security aside, it's a PITA, and you *do* want tools to deal with it. Those tools are *not* expensive; simply audit the editor buffer for characters outside of the user's acceptable set, and be 99% happy. Once you've got tools, it's not a big deal. Can you find somebody with experience to say otherwise? > glyphs, *remembering* the 64,000+ glyphs that are available in just > the primary unicode plane to differentiate them, is a herculean > task. Strawman. The only people who need to remember the glyphs are those who need to read them anyway, or glyphs that look like them (cf Ka-Ping's example). So they have already memorized them. > Never mind the fact that people use dozens, perhaps hundreds of > different editors to write and maintain Python code, that the > 'Emacs works' argument is poor at best. it was invalid then, and > it was invalid now. It was intended only to counter Ka-Ping's strawman of "impossible to detect", and it demolishes that claim. But addressing the content of what you write, you mean that, in a world that allows multilingual identifiers, 'Emacs works' "smells like" [from your original post] a threat to the market share of editors that can't deal with multilingual identifiers, not to mention the work habits of Emacs-haters everywhere, don't you? Well, you're probably wrong. *If* your users need to deal with multilingual identifiers, *maybe* they'll prefer to switch to Emacs. *If* they need extremely robust handling of multilingual identifiers on a daily basis, they probably will switch to Emacs. I doubt it, though. What they'll probably do is write a five line patch to get them 90% of the way to what Emacs gives them out of the box, and be ecstatic that they don't have to use Emacs at all. (That's a guess, as an XEmacs developer I don't see much of that activity.) And that's a big "if". Most of your users will not see code in a language the current version of your editor can't deal with in their working lives, and 90% won't in the usable life of your product. That I can tell you from experience. Emacs has all these wonderful multilingual features, but you know what? 95% of our users are monoscript 100% of the time.[1] 90% of the rest use their primary script 95% of the time. Emacs being multilingual only means that the one language might be Japanese or Thai. If 99% of your users currently use only ISO-8859-15, that isn't going to change by much just because Python now allows Thai identifiers. In other works, if you're up multilingual creek without a paddle, Emacs will get you to shore. Do you have a problem with it, put that way? > That's a invalid argument, and you know it. "Just use hex > escapes"? No, my argument is not "just use hex escapes". Please read it again, and if you wish to respond to what I wrote, feel free. So, you have my apologies, but I still advocate implementation of PEP 3131 over your objections, and those of Ka-Ping. Footnotes: [1] Eg, all Swiss know a half-dozen languages, but they can write all of them with one script, ISO-8859-15. From stephen at xemacs.org Thu May 24 12:19:02 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 24 May 2007 19:19:02 +0900 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <46549A7A.6000807@cs.byu.edu> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> <20070523111704.85FC.JCARLSON@uci.edu> <46549A7A.6000807@cs.byu.edu> Message-ID: <878xbe5xt5.fsf@uwakimon.sk.tsukuba.ac.jp> Neil Toronto writes: > Though I don't develop an editor in my spare time, I had a similar > reaction to the "Emacs does Unicode this way, which is correct" > solutions. My favorite editor is going to have to get awfully smart. It isn't. It will need to learn about widechars, which is painful for the editor's developer. (But only if she writes in C: "what do you mean I can't use strncat?!") Other than that, there's probably not that much to it (see the last part of my reply to Josiah). Most editors have access to a reasonable GUI environment these days that will handle the input and the fonts (even if that environment comes via Terminal or uxterm). From stephen at xemacs.org Thu May 24 13:17:57 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 24 May 2007 20:17:57 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <877iqy5v2y.fsf@uwakimon.sk.tsukuba.ac.jp> Ka-Ping Yee writes: > On Wed, 23 May 2007, Stephen J. Turnbull wrote: > > > It means users could see the usability benefits of PEP3131, but the > > > python internals could still work with ASCII only. > > But this reasoning is not coherent. Python internals will have no > > problems with non-ASCII; in fact, they would have no problems with > > tokens containing Cf characters or even reserved code points. Just > > give an unambiguous grammar for tokens composed of code points. It's > > only when a human enters the loop (ie, presentation of the identifier > > on an output stream) that they cause problems. > > You've got this backwards, and I suspect that's part of the root of > the disagreement. It's not that "when humans enter the loop they > cause problems." The purpose of the language is to *serve humans*. Of course! "Incoherent" refers *only* to "python internals". We need to look at the parts of the loop where the humans are. N.B. I take offense at your misquote. *Humans do not cause problems.* It is *non-ASCII tokens* that *cause* the (putative) problem. However, the alleged problems only arise when humans are present. > The grammar has to be something a human can understand. There are an infinite number of ASCII-only Python tokens. Whether those tokens are lexically composed of a small fixed finite alphabet vs. a large extensible finite alphabet doesn't change anything in terms of understanding the *grammar*. The character-identity problem is vastly aggravated (created, if you insist) by large numbers of characters, but that is something separate. I don't understand why you conflate lexical issues with the still-fits-in-*my*-pin-head simplicity of the Python grammar. Am I missing something? > (And if 90%, or more than 50%, of the tools are "broken" with respect > to the language, that's a language problem, not just a tool problem.) It's a *problem* for the tools, because they may become obsolete, depending on how expensive the feature of handling new language constructs is. It is an *issue* for the language, *not* a "problem" in the same sense. The language designer must balance the problems faced by the tools, and the cost of upgrading them---including users' switching costs!---against the benefits of the new language feature. Nothing new here. The question is how expensive will the upgrade be, and what are the benefits. My experience suggests that the cost is negligible *because most users won't use non-ASCII identifiers*, and they'll just stick with their ASCII-only tools. The benefits are speculative; I know that my students love the idea of a programming language that doesn't look like English (which has extremely painful associations for most). And there are cases (Dutch tax law, Japanese morphology) where having a judicious selection of non-ASCII identifiers is very convenient. Specifically, from my own experience, if I don't know what a particular function in edict is supposed to do, I just ask the nearest Japanese. And they tell me, "oh, that parses the INFLECTION-TYPE of PART-OF-SPEECH", and when I look blank, they continue, "you know, the '-masu' in 'gozaimasu'". Now, since there is no exact equivalent to "-masu" in English (or any European language AFAIK), it would be impossible to give a precise self-documenting name in ASCII. Sure, you can work around this -- but why not put down the ASCII hammer and save on all that ibuprofen? > > I propose it would be useful to provide a standard mechanism for > > auditing the input stream. There would be one implementation for the > > stdlib that complains[1] about non-ASCII characters and possibly > > non-English words, and IMO that should be the default > > This should be built in to the Python interpreter and on by default, > unless it is turned off by a command-line switch that says "I want to > allow the full set of Unicode identifier characters in identifiers." I'd make it more tedious and more flexible to relax the restriction, actually. "python" gives you the stdlib, ASCII-only restriction. "python -U TABLE" takes a mandatory argument, which is the table of allowed characters. If you want to rule out "stupid file substitution tricks", TABLE could take the special arguments "stdlib" and "stduni" which refer to built-in tables. But people really should be able to restrict to "Japanese joyo kanji, kana, and ASCII only" or "IBM Japanese only" as local standards demand, so -U should also be able to take a file name, or a module name, or something like that. > If we are going to allow Unicode identifiers at all, then I would > recommend only allowing identifiers that are already normalized > (in NFC). Already in the PEP. > The ideas that I'm in favour of include: > > (e) Use a character set that is fixed over time. The BASIC that I learned first only had 26 user identifiers. Maybe that's the way we should go? From stephen at xemacs.org Thu May 24 13:55:01 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 24 May 2007 20:55:01 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> Jim Jewett writes: > I would like an alert (and possibly an import exception) on any code > whose *executable portion* is not entirely in ASCII. Are you talking about language definition or implementation? I like the idea of such checks, as long as they are not mandatory in the language and can be turned off easily at run time in the default configuration. I'd also really like a generalization (described below). > > The only issues PEP 3131 should be concerned with *defining* > > are those that cause problems with canonicalization, and the range of > > characters and languages allowed in the standard library. > > Fair enough -- but the problem is that this isn't a solved issue yet; IMHO the stdlib *is* a solved issue. The PEP says "in the standard library, we use ASCII only, except in tests and the like," and "we use English unless there is no reasonable equivalent in English." That's right. AFAIK *canonicalization* is also a solved issue (although exactly what "NFC" means might change with Unicode errata and of course with future addition of combining characters or precombined characters). The notion of "identifier constituent" is a bit thorny. While in general Cf characters don't belong in my understanding, there are some weird references to ZWJ and ZWNJ that I don't understand in UAX#31. I say "leave them out until somebody named 'Bhattacharya' says 'Hey! I need that!'" In general, when in doubt, leave it out. And prohibit it. I think it's a very bad idea to give identifier authors *any* control over their presentation to readers. If an editor has a broken or nonexistent bidi implementation, for example, its user is probably used to that. With *sufficient* breakage in a presentation algorithm, I suppose that the same identifier could be presented differently in different contexts, and that different identifiers could be presented identically. But that's not Python's problem. This can easily happen in ASCII, too. (Consider an editor that truncates display silently at column 80.) > Even having read their reports, my initial rules would still have > banned mixed-script, which would have prevented your edict- > example. Urk. I see your point (Ka-Ping's Cyrillic example makes it glaringly clear why that's the conservative way to go). I don't have to like it, but I could live with it. (Especially since "edict-" is a poor man's namespace. That device isn't needed in Python.) > > I propose it would be useful to provide a standard mechanism for > > auditing the input stream. There would be one implementation for the > > stdlib .... A second .... A third, .... > > This might deal with my concerns. It is a bit more complicated than > the current plans. Well, what I *really* want is a loadable table. My motivation is that I want organizations to be able to "enforce" a policy that is less restrictive than "ASCII-only" but more restrictive than "almost anything goes". My students don't need Sanskrit; Guido's tax accountant doesn't need kanji, and neither needs Arabic. I think that they should be able to get the same strict "alert or even import exception" (that you want on non-ASCII) for characters outside their larger, but still quite restricted, sets. From showell30 at yahoo.com Thu May 24 15:20:26 2007 From: showell30 at yahoo.com (Steve Howell) Date: Thu, 24 May 2007 06:20:26 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <877iqy5v2y.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <257070.91051.qm@web33507.mail.mud.yahoo.com> --- "Stephen J. Turnbull" wrote: > > I'd make it more tedious and more flexible to relax > the restriction, > actually. "python" gives you the stdlib, ASCII-only > restriction. > "python -U TABLE" takes a mandatory argument, which > is the table of > allowed characters. Now that the PEP has been accepted, maybe some more language could be added to it that addresses the concerns of folks who want to keep their code ASCII-only. It seems that if Python, by default, restricts to ASCII, then you at least eliminate the most obvious objections. (You still have the indirect arguments about it contributing to less code written in English worldwide, etc.). Then, for all the other classes of users (Dutch tax lawyer who still doesn't want Sanskrit, etc.), do you advocate having multiple convenient ways to specify their desired character set (command line flag, env setting, magic directive at top of file, etc.), or do you want the "one true way"? ____________________________________________________________________________________Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 From gproux at gmail.com Thu May 24 10:47:13 2007 From: gproux at gmail.com (Guillaume Proux) Date: Thu, 24 May 2007 17:47:13 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <548742.28521.qm@web33503.mail.mud.yahoo.com> References: <465524E8.4000008@v.loewis.de> <548742.28521.qm@web33503.mail.mud.yahoo.com> Message-ID: <19dd68ba0705240147k76e9009dna63a6acda449aafa@mail.gmail.com> On 5/24/07, Steve Howell wrote: > -Kkcode specifies KANJI (Japanese) code-set > Isn't it to simply let Ruby know which is the actual codepage (encoding) in which the file is encoded? Regards, Guillaume Proux Scala From stephen at xemacs.org Thu May 24 17:39:12 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 May 2007 00:39:12 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <257070.91051.qm@web33507.mail.mud.yahoo.com> References: <877iqy5v2y.fsf@uwakimon.sk.tsukuba.ac.jp> <257070.91051.qm@web33507.mail.mud.yahoo.com> Message-ID: <871wh65izj.fsf@uwakimon.sk.tsukuba.ac.jp> Steve Howell writes: > Then, for all the other classes of users (Dutch tax > lawyer who still doesn't want Sanskrit, etc.), do you > advocate having multiple convenient ways to specify > their desired character set (command line flag, env > setting, magic directive at top of file, etc.), or do > you want the "one true way"? -1 on magic directive. That delegates the decision to the file. That's not what we want here. +1 on "command line only". Ie, force the user to redefine the Python command with an alias or something if they want to set a different default from site policy. From jimjjewett at gmail.com Thu May 24 17:48:58 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 24 May 2007 11:48:58 -0400 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/24/07, Stephen J. Turnbull wrote: > I have actually worked in an environment where you can't visually > distinguish different characters. Security aside, it's a PITA, and > you *do* want tools to deal with it. ... > simply audit the editor buffer for characters outside of the user's > acceptable set, and be 99% happy. Once you've got tools, it's not a > big deal. Can you find somebody with experience to say otherwise? ... > And that's a big "if". Most of your users will not see code in a > language the current version of your editor can't deal with in their > working lives, ... The problem (with larger charsets) isn't that you regularly face indistinguishable characters. It is that you face them rarely enough that you don't remember to run that audit, so the actual bug is very difficult to track down. Ignoring security issues, that could probably be handled by having to flip a switch before importing those modules. So long as the default allows only ASCII, the act of flipping that switch is my reminder to check. While an on/off toggle would generally be sufficient for my needs, I would feel more comfortable with a per-script allowance, so that I could say "OK, go ahead and allow Kanji, but still warn me if there is a stray Cyrillic character." -jJ From jcarlson at uci.edu Thu May 24 19:50:39 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 24 May 2007 10:50:39 -0700 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20070524082737.862E.JCARLSON@uci.edu> "Stephen J. Turnbull" wrote: > Josiah Carlson writes: > > Removing those words that some found offensive, perhaps I will get a > > reponse to the point of my post: "your tools aren't very good" and > > "Emacs does it right" are not valid responses to the concerns brought up > > regarding unicode. > > You're missing my point still, and I don't find the words offensive. > (It's a pain in the neck, since I already wrote my reply, but I'll > remove them too.) Nor do I find your completely groundless conclusion > that I'm deprecating other tools offensive. I'll skip to the chase here. Much of my concerns could be addressed through the use of commandline, environment variable, or in-source code definitions of what are allowable identifier characters. Generally, in-source definitions (like the coding: directive) are the most flexible, but are the biggest pain for editors and IDEs (which may want to verify every identifier as it is being typed, etc.). The not insignificant problem is that it allows for identifier characters to be defined on a per-module basis. This is 'fixed' by commandline/environment variables, but it also makes running (rather than editing) a bigger pain than it should be. If people can agree on a method for specifying, 'ascii only', 'ascii + character sets X, Y, Z', and it actually becomes an accepted part of the proposal, gets implemented, etc., I will grumble to myself at home, but I will stop trying to raise a stink here. > I find them to be an indicator of your fears which cannot be grounded > in any experience of mine---in exactly the kind of environment PEP > 3131 will provide. I strongly suspect you have no experience at all, > not even hearsay, to offer. *Please* prove me wrong! My experience > is *far* from definitive. My "fear" is that being able to prove (to myself and others) that the code I am looking at does what it should do. As you say, maybe I will never see non-ascii source in my life. But even if I don't, I know some of my users will, and to not be American-centric, I need to continue to provide them with "tools that don't suck" (which will likely necessitate testing using non-ascii identifiers). > But addressing the content of what you write, you mean that, in a > world that allows multilingual identifiers, 'Emacs works' "smells > like" [from your original post] a threat to the market share of > editors that can't deal with multilingual identifiers, not to mention > the work habits of Emacs-haters everywhere, don't you? Please understand me, I don't hate Emacs. I also don't hate Vim. I just don't find that they fit my personal aesthetics for doing what I like and need to do: write Python (and a few others). And because I'm not selling my editor, I don't really care about (my) market share. What I care about is functional tools for everyone who wants to use Python. To me, that means that people should be able to write software in whatever tool they currently prefer, whether that is Emacs, Vim, Eric3, Idle, SPE, NewEdit, DrPython, PythonWin, Scite, PyPE, Nedit, Kate, Gedit, Leo, Boa Constructor, Windows Notepad, Visual Studio, etc. Some of those will get certain functionality for free (due to their use of the same editing component), but each will need to write their own "discover usable characters, verify identifiers, report to user" mechanism (though some will opt for merely syntax highlighting). Some will want/need alternate input methods for needing to write characters out of a user's locale. Who knows, maybe it is as simple as a 5 line change. And maybe it won't be as big a problem as we are concerned about. But "it's not a problem", "in my experience with Java", and "Emacs users rarely if ever have to deal with such things" don't make me feel any better about the issues regarding Python and editors that aren't Emacs*. - Josiah * Partly because I don't know the market share that Emacs has with Java developers, and/or whether Java editor market share is flat across national boundaries. From jimjjewett at gmail.com Thu May 24 20:14:41 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 24 May 2007 14:14:41 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/24/07, Stephen J. Turnbull wrote: > Jim Jewett writes: > > I would like an alert (and possibly an import exception) on any code > > whose *executable portion* is not entirely in ASCII. > Are you talking about language definition or implementation? I like > the idea of such checks, as long as they are not mandatory in the > language and can be turned off easily at run time in the default > configuration. I'd also really like a generalization (described below). Definition; I don't care whether it is a different argument to import or a flag or an environment variable or a command-line option, or ... I just want the decision to accept non-ASCII characters to be explicit. Ideally, it would even be explicit per extra character allowed, though there should obviously be shortcuts to accept entire scripts. > > > The only issues PEP 3131 should be concerned with *defining* > > > are those that cause problems with canonicalization, and the range of > > > characters and languages allowed in the standard library. Sorry; I missed the "stdlib" part of that sentence when I first replied. I agree except that the range of characters/languages allowed by *python* is also an open issue. > AFAIK *canonicalization* is also a solved issue (although exactly what > "NFC" means might change with Unicode errata and of course with future > addition of combining characters or precombined characters). Why NFC? The Tech Reports seem to suggest NFKD -- and that makes a certain amount of sense. Using compatibility characters reduces the problem with equivalent characters that are distinct only for historical reasons. Using decomposed characters simplifies processing. On the other hand, NFC might often be faster in practice, as it might not require changes -- but if you don't do the processing to verify that, then you mess up the hash. I'm willing to trust the judgment of those with more experience, but the decision of which form to use should be explicit. > The notion of "identifier constituent" is a bit thorny. I think it is even thornier than you do, but I think we may agree on an acceptable answer. > Well, what I *really* want is a loadable table. My motivation is that > I want organizations to be able to "enforce" a policy that is less > restrictive than "ASCII-only" but more restrictive than "almost > anything goes". My students don't need Sanskrit; Guido's tax > accountant doesn't need kanji, and neither needs Arabic. I think that > they should be able to get the same strict "alert or even import > exception" (that you want on non-ASCII) for characters outside their > larger, but still quite restricted, sets. So how about (1) By default, python allows only ASCII. (2) Additional characters are permitted if they appear in a table named on the command line. These additional characters should be restricted to code points larger than ASCII (so you can't easily turn "!" into an ID char), but beyond that, anything goes. If you want to include punctuation or undefined characters, so be it. Presumably, code using Kanji would be fairly easy to run in a Kanji environment, but code using punctuation or Linear B would ... need to convince people that there was a valid reason for it. Note that I think a single table argument is sufficient; I don't see the point in saying that identifiers can include Japanese Accounting Numbers, but can't start with them. (Unless someone is going to suggest that they be parsed to their numeric value?) -jJ From martin at v.loewis.de Thu May 24 20:37:07 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 24 May 2007 20:37:07 +0200 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <46554CB8.7050209@canterbury.ac.nz> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> <20070523082241.85F3.JCARLSON@uci.edu> <46551ED6.5070900@v.loewis.de> <46554CB8.7050209@canterbury.ac.nz> Message-ID: <4655DB53.80200@v.loewis.de> > That's not true. In the Unicode BMP fallback font, you can easily >> differentiate all Unicode characters (in the BMP): >> >> http://scripts.sil.org/UnicodeBMPFallbackFont > > Er... somehow I don't think that's what Martin had in mind > when he used the word "font" in that context. :-) That might well be - however, I think that is because of an unclear problem statement. From the discussion, I gathered that the perceived problem is this: "Somebody maliciously sends me a patch, and I want to be able to tell visually that it's wrong." A possible answer to that was proposed as "the editor should render the characters differently", to which the counter-argument was "there is no font to do that, so the editor can't". I just wanted to point out that this just is not true: there is an approach to Unicode fonts where you can guarantee that all characters can be rendered, and that all characters rendered in that font look different. Regards, Martin From martin at v.loewis.de Thu May 24 20:45:34 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 24 May 2007 20:45:34 +0200 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <20070524082737.862E.JCARLSON@uci.edu> References: <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> <20070524082737.862E.JCARLSON@uci.edu> Message-ID: <4655DD4E.3050809@v.loewis.de> > Much of my concerns could be addressed through the use of commandline, > environment variable, or in-source code definitions of what are > allowable identifier characters. Generally, in-source definitions (like > the coding: directive) are the most flexible, but are the biggest pain > for editors and IDEs (which may want to verify every identifier as it is > being typed, etc.). Not sure (anymore) what problem you are trying to solve, but it might be that the coding directive already *is* the solution. If you want to constrain characters that you can use in a single source file, adding a coding directive will automatically impose such a constraint (namely, to the characters available in the encoding). In particular, if you set the encoding to us-ascii, you have restricted your source file to ASCII only. > If people can agree on a method for specifying, 'ascii only', 'ascii + > character sets X, Y, Z', and it actually becomes an accepted part of the > proposal, gets implemented, etc., I will grumble to myself at home, but > I will stop trying to raise a stink here. I think you can stop now - this is supported as a side effect of PEP 263, and implemented for years. > My "fear" is that being able to prove (to myself and others) that the > code I am looking at does what it should do. As you say, maybe I will > never see non-ascii source in my life. But even if I don't, I know some > of my users will, and to not be American-centric, I need to continue to > provide them with "tools that don't suck" (which will likely necessitate > testing using non-ascii identifiers). I think the PEP 263 machinery allows for great flexibility hear. Additional tools can be implemented, of course, and will be produced if there is a demand for them (e.g. post-commit hooks for versioning systems). Regards, Martin From martin at v.loewis.de Thu May 24 20:50:28 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 24 May 2007 20:50:28 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <548742.28521.qm@web33503.mail.mud.yahoo.com> References: <548742.28521.qm@web33503.mail.mud.yahoo.com> Message-ID: <4655DE74.4090708@v.loewis.de> > FWIW the Ruby interpreter (1.8.5) seems to require > this flag to allow you to turn on the Japanese code > set. > > -Kkcode specifies KANJI (Japanese) code-set > > I have no idea whether or not this cripples the > feature in Ruby, and perhaps it's an apples/oranges > comparison. If you don't have source encoding declarations (like the one in PEP 263), you must have some means of setting the source encoding; this is what -Kkcode does (similar to javac's -encoding command line option). This approach has several flaws, e.g. you can only specify a single encoding, which breaks if you have modules in different encodings. In any case, it's different from the suggested -UU option: Python already knows what the source encoding is, -UU would not change that. Instead, that option would merely serve to constrain the source code (if it's not being passed). Regards, Martin From jimjjewett at gmail.com Thu May 24 21:17:39 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 24 May 2007 15:17:39 -0400 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <4655DD4E.3050809@v.loewis.de> References: <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> <20070524082737.862E.JCARLSON@uci.edu> <4655DD4E.3050809@v.loewis.de> Message-ID: On 5/24/07, "Martin v. L?wis" wrote: > > Much of my concerns could be addressed through the use of commandline, > > environment variable, or in-source code definitions of what are > > allowable identifier characters. Generally, in-source definitions (like > > the coding: directive) are the most flexible, but are the biggest pain > > for editors and IDEs (which may want to verify every identifier as it is > > being typed, etc.). > Not sure (anymore) what problem you are trying to solve, but it might be > that the coding directive already *is* the solution. If you want to > constrain characters that you can use in a single source file, adding > a coding directive will automatically impose such a constraint (namely, > to the characters available in the encoding). Wanting to constrain identifiers is not the same as wanting to constrain all characters. > In particular, if you set the encoding to us-ascii, you have restricted > your source file to ASCII only. The stdlib is largely restricted to ASCII. I don't think I want (the vast majority of) the stdlib to grow a coding directive just to enforce this. I also don't want to lift that restriction and accidentally allow Kanji identifiers just because L?wis appears in a comment. -jJ From python at zesty.ca Thu May 24 23:04:16 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Thu, 24 May 2007 16:04:16 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, 24 May 2007, Jim Jewett wrote: > So how about > > (1) By default, python allows only ASCII. > (2) Additional characters are permitted if they appear in a table > named on the command line. > > These additional characters should be restricted to code points larger > than ASCII (so you can't easily turn "!" into an ID char), but beyond > that, anything goes. If you want to include punctuation or undefined > characters, so be it. +1! This is a fine solution. It is better than the "python -U" option I proposed -- it has all the advantages of that proposal, plus: - The identifier character set won't spontaneously change when one upgrades to a new version of Python, even for users of non-ASCII identifiers. - Having to specify the table of acceptable characters demonstrates at least some knowledge of the character set one is using. - It provides the flexibility for different communities to to adopt identifier conventions that suit their preferred tradeoff of risk vs. expressiveness. Jim's proposal appears to be the best path to making everyone happy. -- ?!ng From python at zesty.ca Thu May 24 23:12:55 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Thu, 24 May 2007 16:12:55 -0500 (CDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <4655DB53.80200@v.loewis.de> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> <20070523082241.85F3.JCARLSON@uci.edu> <46551ED6.5070900@v.loewis.de> <46554CB8.7050209@canterbury.ac.nz> <4655DB53.80200@v.loewis.de> Message-ID: On Thu, 24 May 2007, [ISO-8859-1] "Martin v. L?wis" wrote: > > That's not true. In the Unicode BMP fallback font, you can easily > >> differentiate all Unicode characters (in the BMP): > >> > >> http://scripts.sil.org/UnicodeBMPFallbackFont > > > > Er... somehow I don't think that's what Martin had in mind > > when he used the word "font" in that context. :-) > > That might well be - however, I think that is because of an > unclear problem statement. From the discussion, I gathered > that the perceived problem is this: > > "Somebody maliciously sends me a patch, and I want to be > able to tell visually that it's wrong." The BMP fallback font isn't a meaningful answer to that problem unless most people get in the habit of doing code reviews using that font. Most Python programmers, who probably won't be aware of this issue because it doesn't come up in their day-to-day use, are unlikely to do that. -- ?!ng From python at zesty.ca Thu May 24 23:35:47 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Thu, 24 May 2007 16:35:47 -0500 (CDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <4655DD4E.3050809@v.loewis.de> References: <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> <20070524082737.862E.JCARLSON@uci.edu> <4655DD4E.3050809@v.loewis.de> Message-ID: On Thu, 24 May 2007, [ISO-8859-1] "Martin v. L?wis" wrote: > > Much of my concerns could be addressed through the use of commandline, > > environment variable, or in-source code definitions of what are > > allowable identifier characters. [...] > Not sure (anymore) what problem you are trying to solve, but it might be > that the coding directive already *is* the solution. If you want to > constrain characters that you can use in a single source file, adding > a coding directive will automatically impose such a constraint (namely, > to the characters available in the encoding). > > In particular, if you set the encoding to us-ascii, you have restricted > your source file to ASCII only. Alas, the coding directive is not good enough. Have a look at this: http://zesty.ca/python/tricky.png That's an image of a text editor containing some Python code. Can you tell whether running it (post-PEP-3131) will delete your .bashrc file? -- ?!ng From python at zesty.ca Thu May 24 23:44:02 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Thu, 24 May 2007 16:44:02 -0500 (CDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20070523011101.85F0.JCARLSON@uci.edu> <87iraj6bn6.fsf@uwakimon.sk.tsukuba.ac.jp> <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, 24 May 2007, Stephen J. Turnbull wrote: > I'll tell you why Ka-Ping's argument is a strawman. First, one only > *needs* to be able to distinguish those characters that one can read. > It's nice to be able to admire the rest, of course, but you don't need > to see them as a speaker of that language would. You just use a font > you like for the characters you can read, and the rest can be any old > dog. The problem is that you don't know *when* you'll need to distinguish those characters. Situations where things are not obviously incorrect, but only subtly incorrect, are a common source of practical problems. Choosing the full set of Unicode identifier characters as the identifier character set for everyone puts nearly all Python users in that situation. That's what the issue is here: defining correct practice to be something sufficiently difficult that almost everyone's regular practices are subtly wrong in ways they don't fully understand. That's a recipe for bugs, vulnerabilities, confusion, etc. The loadable table that you proposed, and Jim proposed, really sounds like the best way to go here. Those that are ready and able to handle the added complexity can voluntarily adopt it, and those who don't (or don't even know about it) won't have to deal with it. -- ?!ng From foom at fuhm.net Thu May 24 23:47:45 2007 From: foom at fuhm.net (James Y Knight) Date: Thu, 24 May 2007 17:47:45 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <781A2C3C-011E-4048-A72A-BE631C0C5127@fuhm.net> On May 24, 2007, at 5:04 PM, Ka-Ping Yee wrote: >> (1) By default, python allows only ASCII. >> (2) Additional characters are permitted if they appear in a table >> named on the command line. > > +1! This is a fine solution. It is better than the "python -U" > option I proposed -- it has all the advantages of that proposal, plus: > > - The identifier character set won't spontaneously change when > one upgrades to a new version of Python, even for users of > non-ASCII identifiers. FUD. Already won't, unicode explicitly makes that promise. They can add characters, but not remove them. > - Having to specify the table of acceptable characters > demonstrates at least some knowledge of the character set > one is using. This is a negative. Why should I have to show knowledge of the character set I'm using to type the characters? > - It provides the flexibility for different communities to > to adopt identifier conventions that suit their preferred > tradeoff of risk vs. expressiveness. Also a negative. Now, if I want to run the modules from multiple communities I need to figure out how to merge the tables they have to separately distribute with their modules. > Jim's proposal appears to be the best path to making everyone happy. Nope. It does nobody any good. It may make people who fear non-ascii code happy, but only because it totally castrates this feature for people who do want to use non-ascii identifiers. It really seems to me people are spewing a lot of FUD here. Rejecting certain characters when loading a file is simply not necessary. Either: a) you trust that the author of the file has authored it correctly, in which case it doesn't matter one bit what character set they used. Restricting the charset at import time is just something to get in your way with no actual value. b) you don't trust the code, and want to inspect it. Okay, in this case you actually have to inspect the *code* -- checking the character set is an utterly useless thing to do by itself. It tells you nothing useful. While checking the code, you may want to have strange characters outside your comfort range flagged for you. Either grep or editor support are a simple enough solution for this. Or, let's say your editor is unable to highlight suspicious characters, and you want to find identifiers with strange characters, and not get tripped up on comments. Fine, make a tool that uses the compiler.parser module to iterate over identifiers in the source code. Adding baroque command line options for users of other languages to do some useless verification at import time is not an acceptable answer. It'd be better to just reject the PEP entirely. James From foom at fuhm.net Thu May 24 23:50:50 2007 From: foom at fuhm.net (James Y Knight) Date: Thu, 24 May 2007 17:50:50 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On May 24, 2007, at 2:14 PM, Jim Jewett wrote: > The Tech Reports seem to suggest NFKD -- and that makes a certain > amount of sense. Using compatibility characters reduces the problem > with equivalent characters that are distinct only for historical > reasons. Using decomposed characters simplifies processing. Please read again: "Generally if the programming language has case-sensitive identifiers, then Normalization Form C is appropriate; whereas, if the programming language has case-insensitive identifiers, then Normalization Form KC is more appropriate." James From martin at v.loewis.de Fri May 25 00:33:01 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 May 2007 00:33:01 +0200 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> <20070524082737.862E.JCARLSON@uci.edu> <4655DD4E.3050809@v.loewis.de> Message-ID: <4656129D.5000406@v.loewis.de> > Alas, the coding directive is not good enough. Have a look at this: > > http://zesty.ca/python/tricky.png > > That's an image of a text editor containing some Python code. Can you > tell whether running it (post-PEP-3131) will delete your .bashrc file? I would think that it doesn't (i.e. allowed should stay at 0). Why does os.remove get invoked? Regards, Martin From martin at v.loewis.de Fri May 25 00:46:33 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 May 2007 00:46:33 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <465615C9.4080505@v.loewis.de> Ka-Ping Yee schrieb: > On Thu, 24 May 2007, Jim Jewett wrote: >> So how about >> >> (1) By default, python allows only ASCII. >> (2) Additional characters are permitted if they appear in a table >> named on the command line. >> >> These additional characters should be restricted to code points larger >> than ASCII (so you can't easily turn "!" into an ID char), but beyond >> that, anything goes. If you want to include punctuation or undefined >> characters, so be it. > > +1! This is a fine solution. It is better than the "python -U" > option I proposed -2. Any solution found must also accommodate users which are unaware of the security issue, and just want to use their native language for identifiers. So requiring them to change their environment or pass additional command line parameters is unacceptable. > Jim's proposal appears to be the best path to making everyone happy. Please *do* consider the needs of the people who want to actively use the feature as well. Otherwise, you have no chance of understanding what will make everyone happy. Regards, Martin From mike.klaas at gmail.com Fri May 25 01:03:30 2007 From: mike.klaas at gmail.com (Mike Klaas) Date: Thu, 24 May 2007 16:03:30 -0700 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <4656129D.5000406@v.loewis.de> References: <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> <20070524082737.862E.JCARLSON@uci.edu> <4655DD4E.3050809@v.loewis.de> <4656129D.5000406@v.loewis.de> Message-ID: On 24-May-07, at 3:33 PM, Martin v. L?wis wrote: >> Alas, the coding directive is not good enough. Have a look at this: >> >> http://zesty.ca/python/tricky.png >> >> That's an image of a text editor containing some Python code. Can >> you >> tell whether running it (post-PEP-3131) will delete your .bashrc >> file? > > I would think that it doesn't (i.e. allowed should stay at 0). > > Why does os.remove get invoked? Perhaps a letter in the encoding declaration is non-ascii, nullifying the encoding enforcement and allowing a cyrillic 'a' in allowed = 0? -Mike From python at zesty.ca Fri May 25 01:06:16 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Thu, 24 May 2007 18:06:16 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <465615C9.4080505@v.loewis.de> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <465615C9.4080505@v.loewis.de> Message-ID: On Fri, 25 May 2007, [ISO-8859-1] "Martin v. L?wis" wrote: > Please *do* consider the needs of the people who want to actively > use the feature as well. Otherwise, you have no chance of understanding > what will make everyone happy. People who want to use the feature can turn it on. I don't see what's so unreasonable about that. -- ?!ng From jimjjewett at gmail.com Fri May 25 01:12:27 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 24 May 2007 19:12:27 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <465615C9.4080505@v.loewis.de> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <465615C9.4080505@v.loewis.de> Message-ID: On 5/24/07, "Martin v. L?wis" wrote: > Ka-Ping Yee schrieb: > > On Thu, 24 May 2007, Jim Jewett wrote: > >> So how about > >> (1) By default, python allows only ASCII. > >> (2) Additional characters are permitted if they appear in a table > >> named on the command line. > >> These additional characters should be restricted to code > >> points larger than ASCII (so you can't easily turn "!" into > >> an ID char), but beyond that, anything goes. If you want to > >> include punctuation or undefined characters, so be it. > > +1! This is a fine solution. It is better than the "python -U" > > option I proposed > -2. Any solution found must also accommodate users which are > unaware of the security issue, and just want to use their native > language for identifiers. So requiring them to change their > environment or pass additional command line parameters is > unacceptable. There is no hope of explaining security; therefore, the defaults should be relatively safe. If the default is "anything goes", that isn't safe. If the default is "ASCII", that is safe, but possibly inconvenient. It depends on how hard it is to make the switch. Is your concern just that it should be possible to do once (perhaps at install), rather than on each run? That would probably be OK too, so long as the default install was ASCII-only, so that *someone* had to make a decision about what to allow. I assume that large communities will standardize on a tailored table, but a first-pass slightly-too-inclusive table is easy enough to create. Here are the Thaana lines from (unicode consortium file) Scripts.txt 0780..07A5 ; Thaana # Lo [38] THAANA LETTER HAA..THAANA LETTER WAAVU 07A6..07B0 ; Thaana # Mn [11] THAANA ABAFILI..THAANA SUKUN 07B1 ; Thaana # Lo THAANA LETTER NAA Though if it were me, I would probably simplify that to 0780..07B1 ; Thaana Similarly, Devanagari has 15 lines in Scripts.txt, but you could simplify it to 0901..0939 ; Devanagari 093C..094D ; Devanagari 0950..0954 ; Devanagari 0958..0963 ; Devanagari 0966..096F ; Devanagari 097B..097F ; Devanagari or even 0901..097F ; Devanagari and some undefined characters if you (as a Devanagari speaker) were confident that none of your characters would be confused with ASCII. (In practice, you might well want to exclude the Devangari numbers for looking too similar to ASCII digits with different values, but ... that is a judgment call for Devanagari speakers to make, so long as they make it explicitly.) -jJ From showell30 at yahoo.com Fri May 25 01:49:49 2007 From: showell30 at yahoo.com (Steve Howell) Date: Thu, 24 May 2007 16:49:49 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4655DE74.4090708@v.loewis.de> Message-ID: <83075.64514.qm@web33514.mail.mud.yahoo.com> --- "Martin v. L?wis" wrote: > > FWIW the Ruby interpreter (1.8.5) seems to require > > this flag to allow you to turn on the Japanese > code > > set. > > > > -Kkcode specifies KANJI (Japanese) code-set > > > > I have no idea whether or not this cripples the > > feature in Ruby, and perhaps it's an > apples/oranges > > comparison. > > If you don't have source encoding declarations (like > the one > in PEP 263), you must have some means of setting the > source > encoding; this is what -Kkcode does (similar to > javac's > -encoding command line option). > > This approach has several flaws, e.g. you can only > specify > a single encoding, which breaks if you have modules > in > different encodings. > > In any case, it's different from the suggested -UU > option: > Python already knows what the source encoding is, > -UU > would not change that. Instead, that option would > merely > serve to constrain the source code (if it's not > being > passed). > Ok, I think it's pretty clear that this is an apples/oranges comparison, and there are lots of differences between Ruby's implementation and PEP 3131 that muddy the waters. Still, the reason I brought it up is still valid, I think. Ruby is a language that presumably has a lot of Japanese users, and it appears to me (I'm not a Ruby person, so I admit this is speculation) that Japanese users have to explicitly choose to use Japanese encoding to run source files encoded in Japanese. Setting aside all the limitations of Ruby, wouldn't the fact that non-latin-writing Japanese Ruby users live with the command line restriciton in Ruby suggest that they'd be just as willing to live with command line burdens in Python, if they decided to switch to Python? To your point about Py3k being more flexible, couldn't you imagine a scenario where a Japanese programmer gets fed up with Ruby's all-or-nothing capability with respect to Kanji, and switches over to Python, and changes his little wrapper shell script to say "python -U" instead of "ruby -Kkcode"? He could then start to use non-Japanese Python modules while still writing his own Python code in Japanese. ____________________________________________________________________________________Yahoo! oneSearch: Finally, mobile search that gives answers, not web links. http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC From guido at python.org Fri May 25 02:08:53 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 24 May 2007 17:08:53 -0700 Subject: [Python-3000] Accepting PEP 3119, rejecting PEP 3133 Message-ID: I'm accepting PEP 3119 (Abstract Base Classes). The latest round of feedback has been sufficiently friendly that I am confident that it will be a welcome addition. There are some loose ends in the PEP which I will resolve while implementing it. This means I'm also rejecting the main competing proposal, PEP (Roles). I am hopeful that PEP 3124 (Generic Functions) will be updated; since it works so well with ABCs I expect to accept it, in some form; but I'm still waiting for the rewrite that Phillip proposed. I am also expecting to accept PEP 3141 (numeric ABCs). The most serious current objection to that one is that the concrete implementations it provides may not be useful enough to warrant their complexity; maybe I'll just take those out. I'll be pondering this after implementing PEP3119. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri May 25 02:10:12 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 24 May 2007 17:10:12 -0700 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: Message-ID: On 5/18/07, Guido van Rossum wrote: > While reviewing PEPs, I stumbled over PEP 335 ( Overloadable Boolean > Operators) by Greg Ewing. I am of two minds of this -- on the one > hand, it's been a long time without any working code or anything. OTOH > it might be quite useful to e.g. numpy folks. > > It is time to reject it due to lack of interest, or revive it! Last call for discussion! I'm tempted to reject this -- the ability to generate optimized code based on the shortcut semantics of and/or is pretty important to me. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From showell30 at yahoo.com Fri May 25 02:15:38 2007 From: showell30 at yahoo.com (Steve Howell) Date: Thu, 24 May 2007 17:15:38 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <465615C9.4080505@v.loewis.de> Message-ID: <320102.38046.qm@web33515.mail.mud.yahoo.com> --- "Martin v. L?wis" wrote: > > -2. Any solution found must also accommodate users > which are unaware > of the security issue, and just want to use their > native language > for identifiers. So requiring them to change their > environment or > pass additional command line parameters is > unacceptable. Let me say first that I'm 100% behind PEP 3131, and that I agree with you with that many of the objections to the PEP are kind of FUDish. Still, I have a hard time accepting your premise that even ordinary, non-security-aware programmers are so deterred by changing their environment. In almost every programming situation I've been in, I've had to deal with environmental issues, even though my character set of choice has never been the primary issue. When I programmed in C, I had to learn my way around makefiles, figure out LD_LIBRARY_PATH, etc. When I programmed in Perl, I had to change my shebangs when I moved from one Unix box to another, due to the way sys admins installed Perl. When I programmed in Java, I had to learn how .jar files worked. Now that I program in Python, I still have to fuss with PYTHONPATH and LD_LIBRARY_PATH (we use C extensions) when I go between version 20 (installed in the field), version 21 (installed in the text box), and shDevBranch (code I'm working on now). Also in Python, the concept of a wrapper shell script is just part of a programmer's life. I have a program that needs to run as user "operator," and I can't sudo-enable Python itself (big security hole), so I write a one-line sudo script that just calls safe.py, and I sudo enable it. > > Please *do* consider the needs of the people who > want to actively > use the feature as well. Otherwise, you have no > chance of understanding > what will make everyone happy. > I think there are things that can be done here, even if we make Python's default mode to be ascii-pure. Regional distros can set the environment appropriately. Python error messages about non-ascii characters can suggest how to enable the -U flag. The Tokyo Python User's Group can educate programmers, etc. ____________________________________________________________________________________ Don't pick lemons. See all the new 2007 cars at Yahoo! Autos. http://autos.yahoo.com/new_cars.html From gproux+py3000 at gmail.com Fri May 25 03:05:01 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Fri, 25 May 2007 10:05:01 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <320102.38046.qm@web33515.mail.mud.yahoo.com> References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> Message-ID: <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Hello, There has been many proposals of flags around. I don't even understand anymore which -U you are talking about now. But let me add my own proposal for a flag. (just to confuse everybody else a little more) It is my understanding that the only remaining objection for unicode in identifier is for claimed security issues. The most important application of unicode in identifiers in my view would be to bring back computer control to children (like in the OLPC project). (Notwithstanding the fact that AFAIK we have yet to hear about big security issues in Java/C# world that were caused by the ability to use unicode chars). So probably a good flag for security-minded people would be to have like gcc a "pedantic" flag. python -pedantic no_scary_chars_here.py Regarding the the notion you should be able to give a single accepted charset, the problem arises that restricting charsets on a global scope (from a global command line flag or a site.py file) will prevent me for example to freely mix English, French, Greek and Japanese in the same large project and/or dynamically call on any .py with a different charset. I also think one of the great aspect of python is the ability to simply get embedded in other C/C++/etc. projects and as such we need to give the interpreter-embedders the ablity to execute any script the user will present them without restricting to any specific charset. The additional burden that ascii loving people would like to impose on the rest of the world through the usage of command line switches is unwanted IMHO. I would think that a better way to help everybody would be to: 1) have a default of not restricting identifiers charsets but... 2) enable various people (or security minded distributions) to have a customized site.py or $HOME file that would spit warnings or raise exceptions when opening up files that have identifiers that are not pure ascii. Notice that having to verify that EACH and EVERY identifier can be expressed in a specific charset is going to be an expensive runtime cost. A good middle-ground would be to have the main python distribution come out with the site.py spitting warnings (and giving a quite explanation of why the warning and how to disable it for yourself (not globally) if you are REALLY REALLY sure). It would be very interesting to enable the first time "interactive" user to be able to disable the warning for *this* user for good from a simple prompt. Regards, Guillaume From jimjjewett at gmail.com Fri May 25 03:37:47 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 24 May 2007 21:37:47 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: On 5/24/07, Guillaume Proux wrote: > It is my understanding that the only remaining objection for unicode > in identifier is for claimed security issues. It isn't strictly security; when I've been burned by cut-and-paste that turned out to be an unexpected character, it didn't cause damage, but it did take me a long time to debug. > Regarding the the notion you should be able to give a single > accepted charset, the problem arises that restricting charsets on a > global scope (from a global command line flag or a site.py file) will > prevent me for example to freely mix English, French, Greek and > Japanese in the same large project For most people, the appearance of a Greek or Japanese (let alone both) character would be more likely to indicate a typo. If you know that your project is using both languages, then just allow both; the point is that you have made an explicit decision to do so. -jJ From showell30 at yahoo.com Fri May 25 04:01:07 2007 From: showell30 at yahoo.com (Steve Howell) Date: Thu, 24 May 2007 19:01:07 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: <250467.66423.qm@web33502.mail.mud.yahoo.com> --- Guillaume Proux wrote: > > The additional burden that ascii loving people would > like to impose on > the rest of the world through the usage of command > line switches is > unwanted IMHO. > I think now that PEP 3131 has been accepted, you can coarsely frame the remaining conflict as between ascii lovers and non-ascii lovers, and the dispute is over who has to muck with their command line/environment to get Python to reflect their bias. Obviously, in any conflict, there are solutions that mostly satisfy both parties. If Python 3.0 leaned too much toward appeasing non-ascii lovers, you could still devise plenty of workarounds that made ascii lovers not suffer too immensely. Ascii lovers could revisit their security philosophies, by paying more scrunity to who actually supplies patches, etc. Ascii lovers could upgrade their editors, run more unit tests, etc. Ascii lovers could build tools from tokenize.py, etc., that facilitated the porting of non-English or non-Latin code to English/lation. If Python 3.0 leaned too much toward appeasing ascii lovers, you could still devise plenty of workarounds that made non-ascii lovers not suffer too immensely. You could make error messages more helpful, you could have regional distros supply useful aliases, you could have users groups educated newbies, etc. If Python 3.0 judged the middle ground wrong, you could adjust in Python 3.1. Of course, there's a lot of gray area when you put people on the spectrum. As an example, take me--I'm mostly an ascii lover, but I'm sympathetic to non-ascii concerns. My first language is English, but I speak a bit of French, have written applications for Spanish users, and have collaborated with people who internationalized my software for languages that I'm almost completely unfamiliar with (Dutch, Catalan, etc.) Regarding the "command line," this ascii mostly-lover doesn't necessarily want to impose command line restrictions on anybody. I'd much rather impose "environment" restrictions on ALL Python users. Here's my reasoning: 1) It's fair. Even as an ascii lover and beneficiary, I have to deal with environment variables nearly as much as non-ascii lovers (PYTHONPATH, LD_LIBRARY_PATH, ORA_HOME, etc.) 2) It's really all about the environment. There's a difference between running Python in an enterprisy environment, an OLPC environment, a Japanese-person-trying-to-ween-himself-off-Ruby environment, etc. 3) It's often free. I suspect most non-ASCII users already have environment settings that suggest their willingness to tolerate lack of ASCII purity. Coudn't Python sniff those out? ____________________________________________________________________________________Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 From gproux+py3000 at gmail.com Fri May 25 04:01:56 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Fri, 25 May 2007 11:01:56 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: <19dd68ba0705241901h23468237md8e81aaa65f9b7a6@mail.gmail.com> Hi Jim, On 5/25/07, Jim Jewett wrote: > It isn't strictly security; when I've been burned by cut-and-paste > that turned out to be an unexpected character, it didn't cause damage, > but it did take me a long time to debug. Can you give a longer explanation because I don't understand what is the issue. Is it like the issue with confusing 0 and O ? You seemingly already have an experience with using something that is now not legal in Python. Was it in Java or .NET world? > For most people, the appearance of a Greek or Japanese (let alone > both) character would be more likely to indicate a typo. If you know > that your project is using both languages, then just allow both; the > point is that you have made an explicit decision to do so. You are missing one of my main points but it is maybe not a very strong point (the earlier email was maybe throwing away too many ideas at a time... i guess japanese sake lasts longer in the mouth :) ) * Python is dynamic (you can have a e.g. pygtk user interface which enables you to load at runtime a new .py file even to use a text view to type in a mini-script that will do something specific in your application domain): you never know what will get loaded next * Python is embeddable: and often it is to bring the power of python to less sophisticated users. You can imagine having a global system deployed all around the world by a global company enabling each user in each subsidiary to create their own extension scripts. * There is a runtime cost for checking: the speed vs. security tradeoff (for a security benefit that is still very much hypothetical in the face of the experience of Java and .NET people) should be born by the paranoid people (who are ALREADY accustomed to losing CPU cycles to RSBAC security systems). * In real life, you won't see much python programs that are not written in your script. If you are really paranoid to see evil chars take over your python src dir though, a -pedantic option as pointed out earlier should take care of all your worries. cheers, G From greg.ewing at canterbury.ac.nz Fri May 25 04:05:35 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 25 May 2007 14:05:35 +1200 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: Message-ID: <4656446F.8030802@canterbury.ac.nz> Guido van Rossum wrote: > Last call for discussion! I'm tempted to reject this -- the ability to > generate optimized code based on the shortcut semantics of and/or is > pretty important to me. Please don't be hasty. I've had to think about this issue a bit. The conclusion I've come to is that there may be a small loss in the theoretical amount of optimization opportunity available, but not much. Furthermore, if you take into account some other improvements that can be made (which I'll explain below) the result is actually *better* than what 2.5 currently generates. For example, Python 2.5 currently compiles if a and b: into JUMP_IF_FALSE L1 POP_TOP JUMP_IF_FALSE L1 POP_TOP JUMP_FORWARD L2 L1: 15 POP_TOP L2: Under my PEP, without any other changes, this would become LOGICAL_AND_1 L1 LOGICAL_AND_2 L1: JUMP_IF_FALSE L2 POP_TOP JUMP_FORWARD L3 L2: 15 POP_TOP L3: The fastest path through this involves executing one extra bytecode. However, since we're not using JUMP_IF_FALSE to do the short-circuiting any more, there's no need for it to leave its operand on the stack. So let's redefine it and change its name to POP_JUMP_IF_FALSE. This allows us to get rid of all the POP_TOPs, plus the jump at the end of the statement body. Now we have LOGICAL_AND_1 L1 LOGICAL_AND_2 L1: POP_JUMP_IF_FALSE L2 L2: The fastest path through this executes one *less* bytecode than in the current 2.5-generated code. Also, any path that ends up executing the body benefits from the lack of a jump at the end. The same benefits also result when the boolean expression is more complex, e.g. if a or b and c: becomes LOGICAL_OR_1 L1 LOGICAL_AND_1 L2 LOGICAL_AND_2 L2: LOGICAL_OR_2 L1: POP_JUMP_IF_FALSE L3 L3: which contains 3 fewer instructions overall than the corresponding 2.5-generated code. So I contend that optimization is not an argument for rejecting this PEP, and may even be one for accepting it. -- Greg From gproux+py3000 at gmail.com Fri May 25 04:13:01 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Fri, 25 May 2007 11:13:01 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <250467.66423.qm@web33502.mail.mud.yahoo.com> References: <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> <250467.66423.qm@web33502.mail.mud.yahoo.com> Message-ID: <19dd68ba0705241913j1d2f60e1ndc89e05bfd926c52@mail.gmail.com> Hello, On 5/25/07, Steve Howell wrote: > willingness to tolerate lack of ASCII purity. Coudn't > Python sniff those out? On my Linux machine, my encoding is set to UTF8 (and I am sure that most monolingual Ubuntu user have the same settings). On my Windows PC, Unicode is the rule of the world. I have a hard time seeing how you could sniff out the willingness to accept in a Japanese environment, a piece of code written in Russian because your buddy from Siberia has written this cool matrix class that is 30% faster than most but contains a bunch of cyrillic characters because people are using cyrillic characters for local variable identifiers (but not module level identifiers). I think that the beauty of the world that has moved from everybody being their little codepage island to the global UTF8 based world (in Linux), UTF16 world (in Windows) is that now all scripts are more or less equal citizen and nobody benefits more than any other or has to do more effort than the others to access to its own language but also other people language. Doing a kind of language segregation would prevent getting more people working together and exchanging code and ideas, while opening up culturally to other horizons and cultures. (and no, I am not smoking illegal substances in front of my keyboard) Guillaume From python at zesty.ca Fri May 25 04:40:03 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Thu, 24 May 2007 21:40:03 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: Guillaume Proux wrote: > It is my understanding that the only remaining objection for unicode > in identifier is for claimed security issues. You're missing much of the debate. Please read this message: http://mail.python.org/pipermail/python-3000/2007-May/007855.html Steve Howell wrote: > I think now that PEP 3131 has been accepted, you can coarsely frame > the remaining conflict as between ascii lovers and non-ascii lovers To pit this as "ascii lovers vs. non-ascii lovers" is a pretty large oversimplification. You could name them "people who want to be able to know what the code says" and "people who don't mind not being able to know what the code says". Or you could name them "people who want Python's lexical syntax to be something they fully understand" and "people who don't mind the extra complexity". Or "people who don't want Python's lexical syntax to be tied to a changing external standard" and "people who don't mind the extra variability." However you characterize them, keep in mind that those in the former group are asking for default behaviour that 100% of Python users already use and understand. There's no cost to keeping identifiers ASCII-only because that's what Python already does. I think that's a pretty strong reason for making the new, more complex behaviour optional. -- ?!ng From showell30 at yahoo.com Fri May 25 04:46:42 2007 From: showell30 at yahoo.com (Steve Howell) Date: Thu, 24 May 2007 19:46:42 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: Message-ID: <456503.41918.qm@web33505.mail.mud.yahoo.com> --- Ka-Ping Yee wrote: > Steve Howell wrote: > > I think now that PEP 3131 has been accepted, you > can coarsely frame > > the remaining conflict as between ascii lovers and > non-ascii lovers > > To pit this as "ascii lovers vs. non-ascii lovers" > is a pretty large > oversimplification. You could name them "people who > want to be able > to know what the code says" and "people who don't > mind not being able > to know what the code says". Or you could name them > "people who want > Python's lexical syntax to be something they fully > understand" and > "people who don't mind the extra complexity". Or > "people who don't > want Python's lexical syntax to be tied to a > changing external > standard" and "people who don't mind the extra > variability." > Agreed. > However you characterize them, keep in mind that > those in the former > group are asking for default behaviour that 100% of > Python users > already use and understand. There's no cost to > keeping identifiers > ASCII-only because that's what Python already does. > Agreed. > I think that's a pretty strong reason for making the > new, more complex > behaviour optional. > Agreed also. Just to be clear, I am 100% in the camp of people who want non-ascii behavior to be an explicit choice, at least for 3.0. EIBTI. But I also think we want to be as creative as possible for enabling and encouraging non-ascii functionality. I think that's where this thread should start focusing. I also share Guillaume's optimistic viewpoint about a Python world with no cultural boundaries, etc. (sorry if that's a bad paraphrase). ____________________________________________________________________________________ We won't tell. Get more on shows you hate to love (and love to hate): Yahoo! TV's Guilty Pleasures list. http://tv.yahoo.com/collections/265 From guido at python.org Fri May 25 04:53:40 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 24 May 2007 19:53:40 -0700 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: <4656446F.8030802@canterbury.ac.nz> References: <4656446F.8030802@canterbury.ac.nz> Message-ID: On 5/24/07, Greg Ewing wrote: > Guido van Rossum wrote: > > > Last call for discussion! I'm tempted to reject this -- the ability to > > generate optimized code based on the shortcut semantics of and/or is > > pretty important to me. > > Please don't be hasty. I've had to think about this issue > a bit. > > The conclusion I've come to is that there may be a small loss > in the theoretical amount of optimization opportunity available, > but not much. Furthermore, if you take into account some other > improvements that can be made (which I'll explain below) the > result is actually *better* than what 2.5 currently generates. > > For example, Python 2.5 currently compiles > > if a and b: > > > into > > > JUMP_IF_FALSE L1 > POP_TOP > > JUMP_IF_FALSE L1 > POP_TOP > > JUMP_FORWARD L2 > L1: > 15 POP_TOP > L2: > > Under my PEP, without any other changes, this would become > > > LOGICAL_AND_1 L1 > > LOGICAL_AND_2 > L1: > JUMP_IF_FALSE L2 > POP_TOP > > JUMP_FORWARD L3 > L2: > 15 POP_TOP > L3: > > The fastest path through this involves executing one extra > bytecode. However, since we're not using JUMP_IF_FALSE to > do the short-circuiting any more, there's no need for it > to leave its operand on the stack. So let's redefine it and > change its name to POP_JUMP_IF_FALSE. This allows us to > get rid of all the POP_TOPs, plus the jump at the end of > the statement body. Now we have > > > LOGICAL_AND_1 L1 > > LOGICAL_AND_2 > L1: > POP_JUMP_IF_FALSE L2 > > L2: > > The fastest path through this executes one *less* bytecode > than in the current 2.5-generated code. Also, any path that > ends up executing the body benefits from the lack of a > jump at the end. > > The same benefits also result when the boolean expression is > more complex, e.g. > > if a or b and c: > > > becomes > > > LOGICAL_OR_1 L1 > > LOGICAL_AND_1 L2 > > LOGICAL_AND_2 > L2: > LOGICAL_OR_2 > L1: > POP_JUMP_IF_FALSE L3 > > L3: > > which contains 3 fewer instructions overall than the > corresponding 2.5-generated code. > > So I contend that optimization is not an argument for > rejecting this PEP, and may even be one for accepting > it. Do you have an implementation available to measure this? In most cases the cost is not in the number of bytecode instructions executed but in the total amount of work. Two cheap bytecodes might well be cheaper than one expensive one. However, I'm happy to keep your PEP open until you have code that we can measure. (However, adding additional optimizations elsewhere to make up for the loss wouldn't be fair -- we would have to compare with a 2.5 or trunk (2.6) interpreter with the same additional optimizations added.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri May 25 05:09:33 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 24 May 2007 20:09:33 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: On 5/24/07, Ka-Ping Yee wrote: > To pit this as "ascii lovers vs. non-ascii lovers" is a pretty large > oversimplification. You could name them "people who want to be able > to know what the code says" and "people who don't mind not being able > to know what the code says". Or you could name them "people who want > Python's lexical syntax to be something they fully understand" and > "people who don't mind the extra complexity". Or "people who don't > want Python's lexical syntax to be tied to a changing external > standard" and "people who don't mind the extra variability." > > However you characterize them, keep in mind that those in the former > group are asking for default behaviour that 100% of Python users > already use and understand. There's no cost to keeping identifiers > ASCII-only because that's what Python already does. > > I think that's a pretty strong reason for making the new, more complex > behaviour optional. If there's a security argument to be made for restricting the alphabet used by code contributions (even by co-workers at the same company), I don't see why ASCII-only projects should have it easier than projects in other cultures. It doesn't look like any kind of global flag passed to the interpreter would scale -- once I am using a known trusted contribution that uses a different character set than mine, I would have to change the global setting to be more lenient, and the leniency would affect all code I'm using. A more useful approach would seem to be a set of auditing tools that can be applied routinely to all new contributions (e.g. as a pre-commit hook when using a source control system), or to all code in a given directory, download, etc. I don't see this as all that different from using e.g. PyChecker of PyLint. While I routinely perform visual code inspections (code review is the law at Google, and I wrote the tool used internally to do these), I certainly don't see this as a security audit -- I use it as a mentoring activity and to reach agreement over issues as diverse as coding style, architecture and implementation techniques between trusting colleagues. Scanning for stray non-ASCII characters is best left to automated tools. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From showell30 at yahoo.com Fri May 25 05:54:17 2007 From: showell30 at yahoo.com (Steve Howell) Date: Thu, 24 May 2007 20:54:17 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: Message-ID: <66548.18605.qm@web33508.mail.mud.yahoo.com> --- Guido van Rossum wrote: > If there's a security argument to be made for > restricting the alphabet > used by code contributions (even by co-workers at > the same company), I > don't see why ASCII-only projects should have it > easier than projects > in other cultures. > > It doesn't look like any kind of global flag passed > to the interpreter > would scale -- once I am using a known trusted > contribution that uses > a different character set than mine, I would have to > change the global > setting to be more lenient, and the leniency would > affect all code I'm > using. > Ok, that argument sways me. Can the debate about security be put to rest by adding something to the "Common Objections" section of the PEP, or has your pronouncement already put the debate to rest? To the extent that recent objections don't fall under security, what are they? Have these been adequately refuted? 1) People want to be able to know what non-ascii code says. 2) People don't want extra complexity in the language. 3) People don't want Python's lexical syntax to be tied to a changing external standard. My opinion: #1 -- easy to refute #2 -- too general to refute #3 -- still an interesting point for debate ____________________________________________________________________________________Need a vacation? Get great deals to amazing places on Yahoo! Travel. http://travel.yahoo.com/ From jcarlson at uci.edu Fri May 25 06:36:12 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 24 May 2007 21:36:12 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: Message-ID: <20070524213605.864B.JCARLSON@uci.edu> "Guido van Rossum" wrote: > On 5/24/07, Ka-Ping Yee wrote: > > To pit this as "ascii lovers vs. non-ascii lovers" is a pretty large > > oversimplification. You could name them "people who want to be able > > to know what the code says" and "people who don't mind not being able > > to know what the code says". Or you could name them "people who want > > Python's lexical syntax to be something they fully understand" and > > "people who don't mind the extra complexity". Or "people who don't > > want Python's lexical syntax to be tied to a changing external > > standard" and "people who don't mind the extra variability." > > > > However you characterize them, keep in mind that those in the former > > group are asking for default behaviour that 100% of Python users > > already use and understand. There's no cost to keeping identifiers > > ASCII-only because that's what Python already does. > > > > I think that's a pretty strong reason for making the new, more complex > > behaviour optional. > > If there's a security argument to be made for restricting the alphabet > used by code contributions (even by co-workers at the same company), I > don't see why ASCII-only projects should have it easier than projects > in other cultures. For the sake of argument, pretend that we went with a command line option to enable certain character sets. In my opinion, there should be a default character set that is allowed. The only character set that makes sense as a default, ignoring previously-existing environment variables (which don't necessarily help us), is ascii. Why? Primarily because ascii identifiers are what are allowed today, and have been allowed for 15 years. But there is this secondary data point that Stephen Turnbull brought up; 95% of users (of Emacs) never touch non-ascii code. Poor extrapolation of statistics aside, to make the default be something that does not help 95% of users seems a bit... overenthusiastic. Where else in Python have we made the default behavior only desired or useful to 5% of our users? With that said, and with what Stephen and others have said about unicode in Java, I don't believe there will be terribly significant cross polination of non-ascii identifier source. Of the source that *does* become popular and has non-ascii identifiers, I don't believe that it would take much time before there are normalized versions of the source, either published by the original authors or created by users. (having a tool to do unicode -> ascii transliteration of identifiers would make this a non-issue) Though others don't like it, I think that having a command line option to enable other character sets is a reasonable burdon to place on the 5% of users that will experience non-ascii identifiers. For those who work with it on a regular basis, having an environment variable should be sufficient (with command line arguments to add additional allowable character sets). For those who wish to import code at runtime and/or have arbitrary identifiers, having an interface for adding or removing allowable character sets for code imported during runtime should work reasonably well (both for people who want to allow arbitrary identifiers, and those who want to restrict identifiers after the runtime system is up). In terms of speed issues that Guillaume has brought up, this is a non-issue. The time to verify identifiers as a pyc is loaded, when every identifier in a pyc file is interned on loading, is insignificant; especially when in Python one can do... for identifier in identifiers: for character in identifier: if character not in allowable_characters: raise ImportError("...") And considering we can do *millions* of dictionary/set lookups each second on a modern machine, I can't imagine that identifier verification time will be a significant burden. - Josiah From martin at v.loewis.de Fri May 25 06:32:15 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 May 2007 06:32:15 +0200 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> <20070524082737.862E.JCARLSON@uci.edu> <4655DD4E.3050809@v.loewis.de> <4656129D.5000406@v.loewis.de> Message-ID: <465666CF.3040507@v.loewis.de> > Perhaps a letter in the encoding declaration is non-ascii, nullifying > the encoding enforcement and allowing a cyrillic 'a' in allowed = 0? I see. Of course, if I receive a patch where one of the lines changed is the coding declaration, and there is no apparent difference between the old and the new declaration, I would become cautious, wondering what's going on. Regrds, Martin From martin at v.loewis.de Fri May 25 06:35:58 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Fri, 25 May 2007 06:35:58 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <465615C9.4080505@v.loewis.de> Message-ID: <465667AE.2090000@v.loewis.de> Ka-Ping Yee schrieb: > On Fri, 25 May 2007, [ISO-8859-1] "Martin v. L?wis" wrote: >> Please *do* consider the needs of the people who want to actively >> use the feature as well. Otherwise, you have no chance of understanding >> what will make everyone happy. > > People who want to use the feature can turn it on. I don't see what's > so unreasonable about that. People who want to use the feature would have to know that it is only present if you turn it on. It's like saying "you can use hexadecimal integer literals, but you have to turn them on". This wouldn't work: people try to use them, find out that it won't work, and assume that it's not supported. Regards, Martin From martin at v.loewis.de Fri May 25 06:38:27 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 May 2007 06:38:27 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <465615C9.4080505@v.loewis.de> Message-ID: <46566843.9080407@v.loewis.de> > Is your concern just that it should be possible to do once (perhaps at > install), rather than on each run? My concern is that people assume that you can't use non-ASCII identifiers if they try it out and it doesn't work. If they believe the feature is not there, that's just as if it really wasn't there. Regards, Martin From jcarlson at uci.edu Fri May 25 07:04:53 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 24 May 2007 22:04:53 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <465667AE.2090000@v.loewis.de> References: <465667AE.2090000@v.loewis.de> Message-ID: <20070524215742.864E.JCARLSON@uci.edu> "Martin v. L?wis" wrote: > Ka-Ping Yee schrieb: > > On Fri, 25 May 2007, [ISO-8859-1] "Martin v. L???wis" wrote: > >> Please *do* consider the needs of the people who want to actively > >> use the feature as well. Otherwise, you have no chance of understanding > >> what will make everyone happy. > > > > People who want to use the feature can turn it on. I don't see what's > > so unreasonable about that. > > People who want to use the feature would have to know that it is only > present if you turn it on. It's like saying "you can use hexadecimal > integer literals, but you have to turn them on". This wouldn't work: > people try to use them, find out that it won't work, and assume > that it's not supported. Are we going to stop offering informational error messages to people? Because an informational error message could go a long way towards helping people to understand what is going on. - Josiah From gproux+py3000 at gmail.com Fri May 25 07:31:00 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Fri, 25 May 2007 14:31:00 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <20070524213605.864B.JCARLSON@uci.edu> References: <20070524213605.864B.JCARLSON@uci.edu> Message-ID: <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> On 5/25/07, Josiah Carlson wrote: > a default character set that is allowed. The only character set that > makes sense as a default, ignoring previously-existing environment > variables (which don't necessarily help us), is ascii. This is ignoring the movement in the last 5-10 years that happened in both the operating systems, filesystems and even language space. Now, the "standard" allowed charset in all of the above environments is Unicode. > Why? Primarily because ascii identifiers are what are allowed today, > and have been allowed for 15 years. But there is this secondary data And guess what, they will still be allowed tomorrow... (tongue-in-cheek) If you look at the typical use case for programs written in python (usually also in rough order of experience) A) directly in interpreter (i love that) B) small-ish one-off scripts C) middle size scripts D) multi-module programs made by a single person E) large-ish programs made by a group of people Out of these, really only people belonging to category E) are expressing an opinion that identifiers should stay ASCII forever. Those should be the same people who have a strong source code compliance policy, unit test, lint-izatoin etc... Unicode support out of the box without constraint strongly benefits category A-D. (just for the funny story, I was asking the opinion of my colleague this morning who is a beginner in Visual Basic.NET about Japanese identifiers, and he was shocked to hear that Python does not accept Japanese identifiers today out of the box... VB.NET apparently does and entry level programmers here DO (ab?)use this). Unicode is an accepted norm isn't it? (even if some extremists in Japan long argue of the superiority of the local encoding over Unicode but apart on 2ch this is an old story now) I think Martin's and my point is that to get people to level E) there is no reason to put any charset restriction on level A ->D. And when you are at level E), it is difficult to argue that making a one-time test at source code checkin time is a bad practice. Regards, Guillaume From rhamph at gmail.com Fri May 25 07:38:19 2007 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 24 May 2007 23:38:19 -0600 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/23/07, Jim Jewett wrote: > > The only issues PEP 3131 should be concerned with *defining* > > are those that cause problems with canonicalization, and the range of > > characters and languages allowed in the standard library. > > Fair enough -- but the problem is that this isn't a solved issue yet; > the unicode group themselves make several contradictory > recommendations. > > I can come up with rules that are probably just about right, but I > will make mistakes (just as the unicode consortium itself did, which > is why they have both ID and XID, and why both have stability > characters). Even having read their reports, my initial rules would > still have banned mixed-script, which would have prevented your edict- > example. If we allowed an underscore as a mixed-script separator (allowing "def get_??(self):"), does this let us get away with otherwise banning mixed-scripts? This wouldn't protect us from single-character identifiers or a single-character identifier segment, but those seem to be fairly obscure (and perhaps suspicious, for those concerned about security). -- Adam Olsen, aka Rhamphoryncus From martin at v.loewis.de Fri May 25 08:24:22 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 May 2007 08:24:22 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <20070524215742.864E.JCARLSON@uci.edu> References: <465667AE.2090000@v.loewis.de> <20070524215742.864E.JCARLSON@uci.edu> Message-ID: <46568116.202@v.loewis.de> >> People who want to use the feature would have to know that it is only >> present if you turn it on. It's like saying "you can use hexadecimal >> integer literals, but you have to turn them on". This wouldn't work: >> people try to use them, find out that it won't work, and assume >> that it's not supported. > > Are we going to stop offering informational error messages to people? > Because an informational error message could go a long way towards > helping people to understand what is going on. I don't think there is precedence in Python for such an informational error message. It is not pythonic to give an error in the case "I know what you want, and I could easily do it, but I don't feel like doing it, read these ten pages of text to learn more about the problem". The most similar case is the future import statement, where we in fact report an error even though it's typically clear what the desired meaning of the program is. However, this statement is only meant as a transitional measure, with a view of eventually changing the error into making the future behavior the default. I understand that you want that to be a permanent error, and this I object to. People should not have to read long system configuration pages just to run the program that they intuitively wrote correctly right from the start. If you think there are cases in which the user should be warned about potential problems and risks, then the warning machinery would be more appropriate. Of course, it would be important to not produce too many false positives for such a warning. Regards, Martin From jcarlson at uci.edu Fri May 25 08:59:59 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Thu, 24 May 2007 23:59:59 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <46568116.202@v.loewis.de> References: <20070524215742.864E.JCARLSON@uci.edu> <46568116.202@v.loewis.de> Message-ID: <20070524234516.8654.JCARLSON@uci.edu> "Martin v. L?wis" wrote: > >> People who want to use the feature would have to know that it is only > >> present if you turn it on. It's like saying "you can use hexadecimal > >> integer literals, but you have to turn them on". This wouldn't work: > >> people try to use them, find out that it won't work, and assume > >> that it's not supported. > > > > Are we going to stop offering informational error messages to people? > > Because an informational error message could go a long way towards > > helping people to understand what is going on. > > I don't think there is precedence in Python for such an informational > error message. It is not pythonic to give an error in the case > "I know what you want, and I could easily do it, but I don't feel > like doing it, read these ten pages of text to learn more about the > problem". ImportError("non-ascii names used without proper charset definition") They hop online, enter that phrase into google, and (hopefully) get a page at python.org that says something like... If you have received this error, and merely want to get your source to run, use: python --charset=unicode ... If you know the character set of the source you want to run (which can be discovered by checking the output of scripts/charset.py), you can use: python --charset= ... If you would like to make this the default, add a PY_CHARSET environment variable with a comma separated list of allowable character sets (ascii is always included). If you would like to programmatically change the allowable character set, use the . > The most similar case is the future import statement, where we in fact > report an error even though it's typically clear what the desired > meaning of the program is. However, this statement is only meant > as a transitional measure, with a view of eventually changing > the error into making the future behavior the default. I understand > that you want that to be a permanent error, and this I object to. That's fine, but it's not just me that has this opinion and desire for ascii default behavior. > People should not have to read long system configuration pages > just to run the program that they intuitively wrote correctly > right from the start. You mean that 5% of users who run into code written using non-ascii identifiers will find this sufficiently burdensome to force the 95% of ascii users to use additional verification and checking tools to make sure that they are not confronted with non-ascii identifiers? I don't find that a reasonable tradeoff for the majority of (non-unicode) users. - Josiah From stephen at xemacs.org Fri May 25 09:10:03 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 May 2007 16:10:03 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <83075.64514.qm@web33514.mail.mud.yahoo.com> References: <4655DE74.4090708@v.loewis.de> <83075.64514.qm@web33514.mail.mud.yahoo.com> Message-ID: <87veeh4bw4.fsf@uwakimon.sk.tsukuba.ac.jp> Steve Howell writes: > respect to Kanji, and switches over to Python, and > changes his little wrapper shell script to say "python > -U" instead of "ruby -Kkcode"? He could then start to > use non-Japanese Python modules while still writing > his own Python code in Japanese. But that's not enough. The problem is that the reason for -Kkcode is that kcode != Unicode. Japanese use several mutually incompatible encodings, and they mix anarchically over the Internet. What -K does is allow you to specify which one you're giving to the interpreter at runtime. The analogy to -K would be if you get a English-language Python source file from somewhere, look into it, realize it's from IBM, and run it with "python -K ebcdic whizbang.py". Same characters, only the bytes are changed to confuse the innocent. That's what -Kkcode is for. From martin at v.loewis.de Fri May 25 09:05:28 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 May 2007 09:05:28 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <83075.64514.qm@web33514.mail.mud.yahoo.com> References: <83075.64514.qm@web33514.mail.mud.yahoo.com> Message-ID: <46568AB8.2010601@v.loewis.de> > Ruby is a language that presumably has a lot of > Japanese users, and it appears to me (I'm not a Ruby > person, so I admit this is speculation) that Japanese > users have to explicitly choose to use Japanese > encoding to run source files encoded in Japanese. > > Setting aside all the limitations of Ruby, wouldn't > the fact that non-latin-writing Japanese Ruby users > live with the command line restriciton in Ruby suggest > that they'd be just as willing to live with command > line burdens in Python, if they decided to switch to > Python? "Just as willing" is probably the right analysis. It's speculation that the ruby users are *happy* that they cannot double-click a kcode script in the explorer to run it, or perhaps there is another mechanism in Ruby that avoids this problem - it's also speculation that you *have* to use this command line option in order to be able to use Japanese identifiers. Regards, Martin From martin at v.loewis.de Fri May 25 09:09:48 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 May 2007 09:09:48 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <320102.38046.qm@web33515.mail.mud.yahoo.com> References: <320102.38046.qm@web33515.mail.mud.yahoo.com> Message-ID: <46568BBC.9060801@v.loewis.de> > In almost every programming situation I've been in, > I've had to deal with environmental issues, even > though my character set of choice has never been the > primary issue. People can certainly adjust to whatever challenges technology confronts them with (some people can do that easier, some have more difficulties). Still, beautiful is better than ugly. > I think there are things that can be done here, even > if we make Python's default mode to be ascii-pure. > Regional distros can set the environment > appropriately. Python error messages about non-ascii > characters can suggest how to enable the -U flag. The > Tokyo Python User's Group can educate programmers, > etc. Yes, but these are all work-arounds for an avoidable ugliness. Regards, Martin From martin at v.loewis.de Fri May 25 09:14:55 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 May 2007 09:14:55 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: <46568CEF.2030900@v.loewis.de> > However you characterize them, keep in mind that those in the former > group are asking for default behaviour that 100% of Python users > already use and understand. There's no cost to keeping identifiers > ASCII-only because that's what Python already does. How does adding conditionality make the language easier to understand? It seems you are still asking for a fork in the language. I very much resist to the notion that forking the language is desirable (for whatever reasons). > I think that's a pretty strong reason for making the new, more complex > behaviour optional. Thus making it simpler????? The more complex behavior still remains, to fully understand the language, you have to understand that behavior, *plus* you need to understand that it may sometimes not be present. Regards, Martin From martin at v.loewis.de Fri May 25 09:36:47 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 May 2007 09:36:47 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <20070524234516.8654.JCARLSON@uci.edu> References: <20070524215742.864E.JCARLSON@uci.edu> <46568116.202@v.loewis.de> <20070524234516.8654.JCARLSON@uci.edu> Message-ID: <4656920F.9040001@v.loewis.de> >> People should not have to read long system configuration pages >> just to run the program that they intuitively wrote correctly >> right from the start. > > You mean that 5% of users who run into code written using non-ascii > identifiers will find this sufficiently burdensome to force the 95% of > ascii users to use additional verification and checking tools to make > sure that they are not confronted with non-ascii identifiers? I don't > find that a reasonable tradeoff for the majority of (non-unicode) users. I think I lost track of what problem you are trying to solve: is it the security issue, or is the the problem Ping stated ("you cannot know the full lexical rules by heart anymore"). If it is the latter, I don't understand why the 95% ascii users need to run additional verification and checking tools. If they don't know the full language, they won't use it - why should they run any checking tools? If it is the security issue, I don't see why a warning wouldn't address the concerns of these users just as well. Regards, Martin From nevillegrech at gmail.com Fri May 25 11:25:17 2007 From: nevillegrech at gmail.com (Neville Grech Neville Grech) Date: Fri, 25 May 2007 11:25:17 +0200 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: <4656446F.8030802@canterbury.ac.nz> Message-ID: >From a user's POV, I'm +1 on having overloadable boolean functions. In many cases I had to resort to overload add or neg instead of and & not, I foresee a lot of cases where the and overload could be used to join objects which represent constraints. Overloadable boolean operators could also be used to implement other types of logic (eg: fuzzy logic). Constraining them to just primitive binary operations in my view will be delimiting for a myriad of use cases. Sure, in some cases, one could overload the neg operator instead of the not but semantically they have different meanings. On 5/25/07, Guido van Rossum wrote: > > On 5/24/07, Greg Ewing wrote: > > Guido van Rossum wrote: > > > > > Last call for discussion! I'm tempted to reject this -- the ability to > > > generate optimized code based on the shortcut semantics of and/or is > > > pretty important to me. > > > > Please don't be hasty. I've had to think about this issue > > a bit. > > > > The conclusion I've come to is that there may be a small loss > > in the theoretical amount of optimization opportunity available, > > but not much. Furthermore, if you take into account some other > > improvements that can be made (which I'll explain below) the > > result is actually *better* than what 2.5 currently generates. > > > > For example, Python 2.5 currently compiles > > > > if a and b: > > > > > > into > > > > > > JUMP_IF_FALSE L1 > > POP_TOP > > > > JUMP_IF_FALSE L1 > > POP_TOP > > > > JUMP_FORWARD L2 > > L1: > > 15 POP_TOP > > L2: > > > > Under my PEP, without any other changes, this would become > > > > > > LOGICAL_AND_1 L1 > > > > LOGICAL_AND_2 > > L1: > > JUMP_IF_FALSE L2 > > POP_TOP > > > > JUMP_FORWARD L3 > > L2: > > 15 POP_TOP > > L3: > > > > The fastest path through this involves executing one extra > > bytecode. However, since we're not using JUMP_IF_FALSE to > > do the short-circuiting any more, there's no need for it > > to leave its operand on the stack. So let's redefine it and > > change its name to POP_JUMP_IF_FALSE. This allows us to > > get rid of all the POP_TOPs, plus the jump at the end of > > the statement body. Now we have > > > > > > LOGICAL_AND_1 L1 > > > > LOGICAL_AND_2 > > L1: > > POP_JUMP_IF_FALSE L2 > > > > L2: > > > > The fastest path through this executes one *less* bytecode > > than in the current 2.5-generated code. Also, any path that > > ends up executing the body benefits from the lack of a > > jump at the end. > > > > The same benefits also result when the boolean expression is > > more complex, e.g. > > > > if a or b and c: > > > > > > becomes > > > > > > LOGICAL_OR_1 L1 > > > > LOGICAL_AND_1 L2 > > > > LOGICAL_AND_2 > > L2: > > LOGICAL_OR_2 > > L1: > > POP_JUMP_IF_FALSE L3 > > > > L3: > > > > which contains 3 fewer instructions overall than the > > corresponding 2.5-generated code. > > > > So I contend that optimization is not an argument for > > rejecting this PEP, and may even be one for accepting > > it. > > Do you have an implementation available to measure this? In most cases > the cost is not in the number of bytecode instructions executed but in > the total amount of work. Two cheap bytecodes might well be cheaper > than one expensive one. > > However, I'm happy to keep your PEP open until you have code that we > can measure. (However, adding additional optimizations elsewhere to > make up for the loss wouldn't be fair -- we would have to compare with > a 2.5 or trunk (2.6) interpreter with the same additional > optimizations added.) > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/nevillegrech%40gmail.com > -- Regards, Neville Grech -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070525/8c88dbf1/attachment.html From python at zesty.ca Fri May 25 11:36:28 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Fri, 25 May 2007 04:36:28 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: On Thu, 24 May 2007, Guido van Rossum wrote: > If there's a security argument to be made for restricting the alphabet > used by code contributions (even by co-workers at the same company), I > don't see why ASCII-only projects should have it easier than projects > in other cultures. This keeps getting characterized as only a security argument, but it's much deeper; it's a basic code comprehension issue. It's all five of the issues I mentioned at http://mail.python.org/pipermail/python-3000/2007-May/007855.html and the additional point about Unicode standards raised by Jim at http://mail.python.org/pipermail/python-3000/2007-May/007863.html I still believe all of these should at least be acknowledged in the PEP. ---- If you like, you could look at this as trying to serve two different communities, the "ASCII folks" and the "non-ASCII folks", as has been said in other messages here. (IMHO, it would be better to think of many different communities of non-ASCII folks rather than just one, which is why the choose-your-own-table solution makes the most sense.) But suppose we just look at the simpler question of "what should the default be?" -- there are two possible behaviours; which should the default favour? All these decision criteria agree: - Explicit or implicit? Better to explicitly enable the new feature. - Simple or complex? ASCII is the simpler character set. - Majority or minority? By far the majority will use only ASCII. - Status quo or new behaviour? ASCII is established and familiar. The safer choice is to stick to ASCII by default. There's nothing to lose by doing so. Why rush to change the lexical syntax? Why is it *necessary* to do it right now, and all at once, and by default? ---- > A more useful approach would seem to be a set of auditing tools that > can be applied routinely to all new contributions (e.g. as a > pre-commit hook when using a source control system), or to all code in > a given directory, download, etc. I don't see this as all that > different from using e.g. PyChecker of PyLint. [...] > Scanning for stray non-ASCII characters is best > left to automated tools. ...like the Python interpreter. Having the Python interpreter do this is a good idea for all the same reasons that the Python interpreter checks for tab/space inconsistency. Imagine a parallel universe in which Python has always forbidden tabs and only allowed spaces for indentation. In Python 3.0, it is proposed to introduce tabs. Alter-Guido announces he will accept the proposal. Some folks are opposed to adding tabs, saying it could be confusing, but he disagrees. Some folks suggest that this feature could at least be made optional, but he disagrees. Some folks suggest that the Python interpreter should at least warn when this happens, but he disagrees. "But," they say, "mixing tabs and spaces can yield programs that have invisibly different meanings. "No matter," says alter-Guido, "you just shouldn't do that." Or "You should use an editor that takes care of this for you." Or "You need to write your own checking tools and scan all your code before you check it in." "But what about all the users who aren't aware of this change?" they ask. Wouldn't it just be so much easier if the Python interpreter did the checking? In our universe, it does, and this is a very good thing. Why did we decide to do that? I would say, becuase it makes our programs more reliable, and it means we have less to worry about when we're coding. Is it a "security issue"? You could call it that, but really it's just a sanity issue. -- ?!ng From python at zesty.ca Fri May 25 11:50:07 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Fri, 25 May 2007 04:50:07 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <46568116.202@v.loewis.de> References: <465667AE.2090000@v.loewis.de> <20070524215742.864E.JCARLSON@uci.edu> <46568116.202@v.loewis.de> Message-ID: On Fri, 25 May 2007, [ISO-8859-1] "Martin v. L?wis" wrote: > I don't think there is precedence in Python for such an informational > error message. SyntaxError: Non-ASCII character '\xd1' in file foo.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details > It is not pythonic to give an error in the case > "I know what you want, and I could easily do it, but I don't feel > like doing it, read these ten pages of text to learn more about the > problem". Python is not a DWIM language. That is one of its strengths. It is Pythonic to give an error in the case "I could guess what this means, but it might be a mistake. Please be clear about what you want." -- ?!ng From python at zesty.ca Fri May 25 11:51:18 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Fri, 25 May 2007 04:51:18 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <465667AE.2090000@v.loewis.de> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <465615C9.4080505@v.loewis.de> <465667AE.2090000@v.loewis.de> Message-ID: On Fri, 25 May 2007, [UTF-8] "Martin v. L??wis" wrote: > Ka-Ping Yee schrieb: > > On Fri, 25 May 2007, [ISO-8859-1] "Martin v. L???wis" wrote: > > People who want to use the feature can turn it on. I don't see what's > > so unreasonable about that. > > People who want to use the feature would have to know that it is only > present if you turn it on. It's like saying "you can use hexadecimal > integer literals, but you have to turn them on". This wouldn't work: > people try to use them, find out that it won't work, and assume > that it's not supported. This argument is absurd. If you know that you want Unicode literals (a NEW FEATURE that has never existed in Python before), you know enough to learn how to use the feature. To show you just how absurd that argument is, realize that it is also an argument for ignoring the entire standard library. Since people have to "import re" before using regular expressions, they'll assume there's no regex support in Python? Of course not -- part of learning how to use regexes is that you "import re"; it's in the documentation, it's in tutorials about regexes, it's how you teach beginners to use regexes, etc. -- ?!ng From python at zesty.ca Fri May 25 11:53:09 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Fri, 25 May 2007 04:53:09 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <46568BBC.9060801@v.loewis.de> References: <320102.38046.qm@web33515.mail.mud.yahoo.com> <46568BBC.9060801@v.loewis.de> Message-ID: On Fri, 25 May 2007, [ISO-8859-1] "Martin v. L?wis" wrote: > > I think there are things that can be done here, even > > if we make Python's default mode to be ascii-pure. > > Regional distros can set the environment > > appropriately. Python error messages about non-ascii > > characters can suggest how to enable the -U flag. The > > Tokyo Python User's Group can educate programmers, > > etc. > > Yes, but these are all work-arounds for an avoidable ugliness. You've got the defaults backwards. If "anything goes" is the default, failures are silent as well as invisible, and you have no help in recovering from them. If "ASCII only" is the default, failures produce an error message, and that error message can guide you to the solution. -- ?!ng From bjourne at gmail.com Fri May 25 11:55:58 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Fri, 25 May 2007 11:55:58 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <20070524215742.864E.JCARLSON@uci.edu> References: <465667AE.2090000@v.loewis.de> <20070524215742.864E.JCARLSON@uci.edu> Message-ID: <740c3aec0705250255k642d6637re46e3929212f1369@mail.gmail.com> On 5/25/07, Josiah Carlson wrote: > > "Martin v. L?wis" wrote: > > Ka-Ping Yee schrieb: > > > On Fri, 25 May 2007, [ISO-8859-1] "Martin v. L???wis" wrote: > > >> Please *do* consider the needs of the people who want to actively > > >> use the feature as well. Otherwise, you have no chance of understanding > > >> what will make everyone happy. > > > > > > People who want to use the feature can turn it on. I don't see what's > > > so unreasonable about that. > > > > People who want to use the feature would have to know that it is only > > present if you turn it on. It's like saying "you can use hexadecimal > > integer literals, but you have to turn them on". This wouldn't work: > > people try to use them, find out that it won't work, and assume > > that it's not supported. > > Are we going to stop offering informational error messages to people? > Because an informational error message could go a long way towards > helping people to understand what is going on. I think you are forgetting who this feature is intended for. I can't for my life imagine that any free software project would start using non-ASCII identifiers, nor any professional software development company either. Decent programmers learn and use English because that is the lingua franca of the computer world. Newbies, on the other hand, would maybe appreciate being able to write: ?rjan = 42 ?sa = 12 P?r = 12 genomsnitts?lder = (?rjan + ?sa + P?r) / 3 print genomsnitts?lder instead of using the (in Swedish) less readable identifiers Orjan, Asa, Par and genomsnitssAlder. If Python required a switch for such a program to run, then this feature would be totally wasted on them. They might use an IDE, program in notepad.exe and dragging the file to the python.exe icon or not even know about cmd.exe or what a command line switch is. An error message, even an informal one, isn't easy to understand if you don't know English. -- mvh Bj?rn From python at zesty.ca Fri May 25 12:00:12 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Fri, 25 May 2007 05:00:12 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <46568116.202@v.loewis.de> References: <465667AE.2090000@v.loewis.de> <20070524215742.864E.JCARLSON@uci.edu> <46568116.202@v.loewis.de> Message-ID: On Fri, 25 May 2007, [ISO-8859-1] "Martin v. L?wis" wrote: > People should not have to read long system configuration pages > just to run the program that they intuitively wrote correctly > right from the start. It is not intuitive. One thing I learned from the discussion here about Unicode identifiers in other languages is that, though this support exists in several other languages, it is *different* in each of them. And PEP 3131 is different still. They allow different sets of characters, and even worse, use different normalization rules. Can you keep straight which letters are allowed in Java, Javascript, C#, Python? What about two identifiers which refer to the same variable in some languages but refer to different variables in others? How do we know that PEP 3131's answer is the right answer and all these other languages chose the wrong answer? This is far from simple. -- ?!ng From jan.grant at bristol.ac.uk Fri May 25 12:25:37 2007 From: jan.grant at bristol.ac.uk (Jan Grant) Date: Fri, 25 May 2007 11:25:37 +0100 (BST) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: <20070525112422.K79178@tribble.ilrt.bris.ac.uk> On Fri, 25 May 2007, Guillaume Proux wrote: > Hello, > > There has been many proposals of flags around. > I don't even understand anymore which -U you are talking about now. > > But let me add my own proposal for a flag. (just to confuse everybody > else a little more) If there must be a flag, +1* to the addition of an "ascii only" flag, and whilst we're at it, let's call it "-parochial". Cheers, jan * although I do not get to vote. -- jan grant, ISYS, University of Bristol. http://www.bris.ac.uk/ Tel +44 (0)117 3317661 http://ioctl.org/jan/ Unfortunately, I have a very good idea how fast my keys are moving. From stephen at xemacs.org Fri May 25 12:45:39 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 May 2007 19:45:39 +0900 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: <4655DD4E.3050809@v.loewis.de> References: <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> <20070524082737.862E.JCARLSON@uci.edu> <4655DD4E.3050809@v.loewis.de> Message-ID: <87sl9l41ws.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > > If people can agree on a method for specifying, 'ascii only', 'ascii + > > character sets X, Y, Z', and it actually becomes an accepted part of the > > proposal, gets implemented, etc., I will grumble to myself at home, but > > I will stop trying to raise a stink here. > > I think you can stop now - this is supported as a side effect of > PEP 263, and implemented for years. -1 That seems not to be the case. PEP 263 allows you to specify a coding system, not a character set. Whether that will restrict the character set depends on how the coding system is implemented. For example, ISO-2022-JP is implicitly a (near) UCS since it does not forbid designations, so you don't know (XEmacs implements it as a UCS, I'm not sure what GNU does), while ISO-2022-JP-2 is explicitly a UCS because it explicitly permits designations. And how about C1 code points in ISO 2022-conformant 8-bit coding systems (including all ISO 8859 systems)? Do they pass, or not? Any restriction is simply a side effect of the codec throwing an exception because it doesn't recognize the input. So this requires that users know how the relevant codec is implemented. Second, this also removes your ability to use literal strings and comments outside that coding system. (Of course Unicode escapes will still be available, but hardly acceptable for string literals, and completely out of the question for comments.) Third, it also has the defect of requiring you to use a legacy coding system, does it not? Ie, if I want to restrict to ASCII + Cyrillic, I can use ISO-8859-5 or KOI8-R but *not* UTF-8. Finally it does not make it easy to create unions or subsets. One has to write a codec to do that. From stephen at xemacs.org Fri May 25 13:13:03 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 May 2007 20:13:03 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> Jim Jewett writes: > Definition; I don't care whether it is a different argument to import > or a flag or an environment variable or a command-line option, or ... > I just want the decision to accept non-ASCII characters to be > explicit. Ka-Ping's tricky.py shows that reliance on magic directives a la PEP 263 loses. I agree with Martin that in practice most such hacks will get caught in the ordinary process of editing, applying patches, sending email, and the like, but if the compiler is going to do the checking on behalf of the *user*, it should not rely on anything the files say. > Ideally, it would even be explicit per extra character allowed, though > there should obviously be shortcuts to accept entire scripts. How about a regexp character class as starting point? > So how about > > (1) By default, python allows only ASCII. +1 But neither Martin nor Guido likes it, so I'm continuing to think about it. Martin's objection that people will try it and assume that it's unimplemented smells like FUD to me, though. > (2) Additional characters are permitted if they appear in a table > named on the command line. +1 > These additional characters should be restricted to code points larger > than ASCII (so you can't easily turn "!" into an ID char) +1 You can specify any character you want, but if it's ASCII, or not in the classes PEP 3131 ends up using to define the maximal set, it gets deleted from the extension table (ASCII has its own table, conceptually). This permits whole scripts, blocks, or ranges to be included. Optionally warn on such deletions at load of the table (that would be better a separate tool), but preferably when parsing the identifier throw a SyntaxError """This character is in the table of extension characters for identifiers, but is of class Cf, which is forbidden in identifiers.""" > If you want to include punctuation or -1 Why waste the effort of the Unicode technical committees? > undefined characters, so be it. -1 Assuming undefined == reserved for future standardization that violates the Unicode standard. -1 on private space characters You *could* argue that a private space character could be valid within a module, or an application of cooperating modules, but I don't think it's worth trying to deal with it. "I'm from Kansas, show me" (a use case). From stephen at xemacs.org Fri May 25 13:33:38 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 May 2007 20:33:38 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <781A2C3C-011E-4048-A72A-BE631C0C5127@fuhm.net> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <781A2C3C-011E-4048-A72A-BE631C0C5127@fuhm.net> Message-ID: <87ps4p3zot.fsf@uwakimon.sk.tsukuba.ac.jp> James Y Knight writes: > > - The identifier character set won't spontaneously change when > > one upgrades to a new version of Python, even for users of > > non-ASCII identifiers. > > FUD. Already won't, unicode explicitly makes that promise. They can > add characters, but not remove them. Addition is a change, in fact it's the change Ka-Ping dislikes most. > > - Having to specify the table of acceptable characters > > demonstrates at least some knowledge of the character set > > one is using. > > This is a negative. Why should I have to show knowledge of the > character set I'm using to type the characters? You don't. Jim's proposal doesn't specify it, but there should be at least two built-in tables, ascii (for the stdlib) and unicode (everything Pythonic in the Identifier classes defined by Unicode). If you don't want to know, just specify -U unicode. And if there isn't one, just grab the list off Martin's "non-normative" table and there you go. > > - It provides the flexibility for different communities to > > to adopt identifier conventions that suit their preferred > > tradeoff of risk vs. expressiveness. > > Also a negative. Now, if I want to run the modules from multiple > communities I need to figure out how to merge the tables they have to > separately distribute with their modules. No, you just use -U unicode. > a) you trust that the author of the file has authored it correctly, > in which case it doesn't matter one bit what character set they used. Which is why 9 out of 10 American viruses recommend Internet Explorer 5 or below. Because most users *do* trust authors and other purveyors, including porn sites, etc. This may be *much less* true of Python users, but I think most domestic offices of most American corporations would be quite happy to disable Unicode identifier support at compile time. > Restricting the charset at import time is just something to get in > your way with no actual value. So don't do it; use -U unicode. I bet Jim J and Josiah and Ka-Ping will all explicitly use -U ascii, just to make sure. What's wrong with that, if that's what they want? > Adding baroque command line options for users of other languages to > do some useless verification at import time is not an acceptable > answer. It'd be better to just reject the PEP entirely. Speaking of exaggeration .... From stephen at xemacs.org Fri May 25 14:53:10 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 May 2007 21:53:10 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: <87odk93w09.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > If there's a security argument to be made for restricting the alphabet > used by code contributions (even by co-workers at the same company), I > don't see why ASCII-only projects should have it easier than projects > in other cultures. (1) Because all projects are currently ASCII-only. I don't hear any complaints from projects currently using non-ASCII identifiers, and there will be few for many months. The scaling argument gets a similar response. I.e., "it won't hurt (not much nor soon)". N.B. Consistent with my Emacs Lisp experience. What is Common Lisp and/or Java experience? I recall Alex Martelli's discussion of even allowing non-English comments during PEP 263. Many shops will resist non-ASCII identifiers in published or purchased modules, even in the European community, I would think. Jamie Zawinski has an amusing anecdote about the great profanity purge at Netscape; I bet that kind of boss would not be at all happy about the idea of swear words he can't read. The only thing that really worries me here is Martin's "people will try it and think it's unimplemented" argument (avoidably delaying diffusion of -U unicode), but I think a SyntaxError: 'non-ASCII identifier: invalid unless enabled with the -U option' would alleviate that. (2) Because due to the scaling argument and reduction of fear of the unknown, as well as development of the collateral tools, changing the default from 'ascii' to 'unicode' will be very natural within a few years. I'm sympathetic to the argument that it's even more natural to make the default unicode _now_ (ie, for the release of Python 3 which is still well in the future) and let the conservatives use '-U ascii', but (a) we have no experience with such a Python, and (b) we don't have any of the tools yet, and I don't see why we would trust them to do a good job without the experience. At least for the "lookalike glyphs" issue the devil is very much in the details. Trial and error stuff, to some extent. From showell30 at yahoo.com Fri May 25 14:49:12 2007 From: showell30 at yahoo.com (Steve Howell) Date: Fri, 25 May 2007 05:49:12 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <87veeh4bw4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <105266.7737.qm@web33504.mail.mud.yahoo.com> --- "Stephen J. Turnbull" wrote: > Steve Howell writes: > > > respect to Kanji, and switches over to Python, > and > > changes his little wrapper shell script to say > "python > > -U" instead of "ruby -Kkcode"? He could then > start to > > use non-Japanese Python modules while still > writing > > his own Python code in Japanese. > > But that's not enough. The problem is that the > reason for -Kkcode is > that kcode != Unicode. Japanese use several > mutually incompatible > encodings, and they mix anarchically over the > Internet. What -K does > is allow you to specify which one you're giving to > the interpreter at > runtime. > > The analogy to -K would be if you get a > English-language Python source > file from somewhere, look into it, realize it's from > IBM, and run it > with "python -K ebcdic whizbang.py". Same > characters, only the bytes > are changed to confuse the innocent. That's what > -Kkcode is for. > I think you misintrepeted my post a bit. I wasn't suggesting that Python implement a flag that was exactly equivalent to the -K flag in Ruby. I understand the arguments that such a flag might be either unnecessary in Python, or unsatisfactory. What I was trying to say here is that there might be precedent for non-ascii users already tolerating command line arguments. ____________________________________________________________________________________Luggage? GPS? Comic books? Check out fitting gifts for grads at Yahoo! Search http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz From showell30 at yahoo.com Fri May 25 15:03:01 2007 From: showell30 at yahoo.com (Steve Howell) Date: Fri, 25 May 2007 06:03:01 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <46568BBC.9060801@v.loewis.de> Message-ID: <788141.82125.qm@web33507.mail.mud.yahoo.com> --- "Martin v. L?wis" wrote: > > In almost every programming situation I've been > in, > > I've had to deal with environmental issues, even > > though my character set of choice has never been > the > > primary issue. > > People can certainly adjust to whatever challenges > technology confronts them with (some people can do > that easier, some have more difficulties). Still, > beautiful is better than ugly. > Remember, you and I have no disagreement whatsoever about what the Python code looks like. I look forward to seeing beautiful code written in French, Korean, etc. under PEP 3131, and I have not opposed anything in the proposal that affects the code itself. We're just disagreeing about whether the Dutch tax law programmer has to uglify his environment with an alias of Python to "python3.0 -liberal_unicode," or whether the American programmer in an enterprisy environment has to uglify his environment with an alias of Python to "python3.0 -parochial" to mollify his security auditors. I guess you could argue that the American programmer in an enterprisy environment already is dealing with so much ugliness, it wouldn't matter. ;) ____________________________________________________________________________________Yahoo! oneSearch: Finally, mobile search that gives answers, not web links. http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC From showell30 at yahoo.com Fri May 25 15:17:18 2007 From: showell30 at yahoo.com (Steve Howell) Date: Fri, 25 May 2007 06:17:18 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> Message-ID: <857617.19874.qm@web33514.mail.mud.yahoo.com> --- Guillaume Proux wrote: > If you look at the typical use case for programs > written in python > (usually also in rough order of experience) > A) directly in interpreter (i love that) > B) small-ish one-off scripts > C) middle size scripts > D) multi-module programs made by a single person > E) large-ish programs made by a group of people > I have a funny dilemma as an ASCII user. When I write small-ish one-off scripts (category B), I often start typing rapid fire, and there's a feature in vim that if I hit just the wrong combination of keys, I get an accented e, even though I intend to write unaccented English. This happens to me about once a month, and I forget exactly what Python does when I try to run the program where one identifier has the accented e, and a later identifier doesn't. I'm not drawing any specific conclusion from this anecdote about what to do in Py3k; I'm just pointing out that ascii users can get flustered by non-ascii characters, and sometimes it's purely accidental that we introduce them to our code. ____________________________________________________________________________________Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow From gproux+py3000 at gmail.com Fri May 25 15:53:00 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Fri, 25 May 2007 22:53:00 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705250641j348a42adu974fe4969897761e@mail.gmail.com> References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> <87odk93w09.fsf@uwakimon.sk.tsukuba.ac.jp> <19dd68ba0705250641j348a42adu974fe4969897761e@mail.gmail.com> Message-ID: <19dd68ba0705250653v2c2a8188jac8c4ccc722fb747@mail.gmail.com> One issue with the command line argument (and that unfortunately applies ONLY to the -U case) that i haven't seen properly answered to is.. On 5/25/07, Stephen J. Turnbull wrote: > SyntaxError: 'non-ASCII identifier: invalid unless enabled with the -U option' Am I the only person on Earth to routinely start my python programs by double clicking on them?? I don't think my daughter would be able to understand what happens if the program does not start. In another similar universe, the mutant python little brother Boo is not seeing much flames erupt from the uncontroversial proposal to enable unicode for identifiers... http://jira.codehaus.org/browse/BOO-633 Cheers, Guillaume From ncoghlan at gmail.com Fri May 25 15:55:01 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 May 2007 23:55:01 +1000 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <46568CEF.2030900@v.loewis.de> References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> <46568CEF.2030900@v.loewis.de> Message-ID: <4656EAB5.6080405@gmail.com> Martin v. L?wis wrote: >> I think that's a pretty strong reason for making the new, more complex >> behaviour optional. > > Thus making it simpler????? The more complex behavior still remains, > to fully understand the language, you have to understand that behavior, > *plus* you need to understand that it may sometimes not be present. It's simpler because any existing automated unit tests will flag non-ascii identifiers without modification. Not only does it prevent surreptitious insertion of malicious code, but existing projects don't have to even waste any brainpower worrying about the implications of Unicode identifiers (because library code typically doesn't care about client code's identifiers, only about the objects the library is asked to deal with). However, what the option *does* enable is for a class of users/developers to employ a broader range of characters if they *or their teacher or employer* choose to do so. A free-for-all wasn't even proposed for strings and comments in PEP 263 - why shouldn't we be equally conservative when it comes to progressively enabling Unicode identifiers? Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From guido at python.org Fri May 25 16:32:57 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 25 May 2007 07:32:57 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <20070524213605.864B.JCARLSON@uci.edu> References: <20070524213605.864B.JCARLSON@uci.edu> Message-ID: On 5/24/07, Josiah Carlson wrote: > Where else in Python have we made the default > behavior only desired or useful to 5% of our users? Where are you getting that statistic? This seems an extremely backwards, US-centric worldview. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ronaldoussoren at mac.com Fri May 25 16:24:43 2007 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 25 May 2007 07:24:43 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <788141.82125.qm@web33507.mail.mud.yahoo.com> References: <788141.82125.qm@web33507.mail.mud.yahoo.com> Message-ID: <2CB3D8B9-0112-1000-A9BC-2D28C5213B40-Webmail-10016@mac.com> On Friday, May 25, 2007, at 03:03PM, "Steve Howell" wrote: > > >Remember, you and I have no disagreement whatsoever >about what the Python code looks like. I look forward >to seeing beautiful code written in French, Korean, >etc. under PEP 3131, and I have not opposed anything >in the proposal that affects the code itself. > >We're just disagreeing about whether the Dutch tax law >programmer has to uglify his environment with an alias >of Python to "python3.0 -liberal_unicode," or whether >the American programmer in an enterprisy environment >has to uglify his environment with an alias of Python >to "python3.0 -parochial" to mollify his security >auditors. > >I guess you could argue that the American programmer >in an enterprisy environment already is dealing with >so much ugliness, it wouldn't matter. ;) This could easily be solved by tool support instead of yet another switch (and in effect language variant). That is, pylint, pychecker or even a svn pre-commit hook could report on code that doesn't use the character range that is valid according the coding conventions for the project. I'm +0.5 on adding Unicode identifier support because it would allow me to use accented characters in localized code whenever appropriate. Ronald > > > > > >____________________________________________________________________________________Yahoo! oneSearch: Finally, mobile search >that gives answers, not web links. >http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC >_______________________________________________ >Python-3000 mailing list >Python-3000 at python.org >http://mail.python.org/mailman/listinfo/python-3000 >Unsubscribe: http://mail.python.org/mailman/options/python-3000/ronaldoussoren%40mac.com > > From stephen at xemacs.org Fri May 25 17:07:42 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 May 2007 00:07:42 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <105266.7737.qm@web33504.mail.mud.yahoo.com> References: <87veeh4bw4.fsf@uwakimon.sk.tsukuba.ac.jp> <105266.7737.qm@web33504.mail.mud.yahoo.com> Message-ID: <87k5ux3ps1.fsf@uwakimon.sk.tsukuba.ac.jp> Steve Howell writes: > What I was trying to say here is that there might be > precedent for non-ascii users already tolerating > command line arguments. It's an idea, but it turns out not to correspond to reality. It only shows there's a precedent for Japanese tolerating command line arguments. The Japanese encoding mess is unique, and shameful. From jimjjewett at gmail.com Fri May 25 16:56:52 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 May 2007 10:56:52 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705241901h23468237md8e81aaa65f9b7a6@mail.gmail.com> References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> <19dd68ba0705241901h23468237md8e81aaa65f9b7a6@mail.gmail.com> Message-ID: On 5/24/07, Guillaume Proux wrote: > Hi Jim, > On 5/25/07, Jim Jewett wrote: > > It isn't strictly security; when I've been burned by cut-and-paste > > that turned out to be an unexpected character, it didn't cause damage, > > but it did take me a long time to debug. > Can you give a longer explanation because I don't understand what is > the issue. Is it like the issue with confusing 0 and O ? You seemingly > already have an experience with using something that is now not legal > in Python. Was it in Java or .NET world? The really hard-to-debug ones were usually in C. It happened more when I was less experienced, or the available tools were limited. They usually involved something that looked like a quote mark, but wasn't. (I worry about the characters that look like a less-than sign, but I've never had trouble with them in practice. Problems with other punctuation were rare enough that I can't say they were worse than "." vs "," or ":" vs ";".) This would be less of a problem in python because it takes triple-quotes to continue a line string across multiple lines -- but it would still be an occasional problem. This would be less of a problem if I had started out smarter, or I if never worked with people who used presentation-focused editors (like MS Word) when discussing code, but those are only theoretical possibilities. > > For most people, the appearance of a Greek or Japanese (let alone > > both) character would be more likely to indicate a typo. If you know > > that your project is using both languages, then just allow both; the > > point is that you have made an explicit decision to do so. > * Python is dynamic (you can have a e.g. pygtk user interface which > enables you to load at runtime a new .py file even to use a text view > to type in a mini-script that will do something specific in your > application domain): you never know what will get loaded next I am not missing that -- that is the situation I worry about *most*. If I'm running something that new, and I've only inspected it visually, I want a great big warning about unexpected characters that merely look like what I thought they were. No, this won't happen often -- but like threading race conditions, that almost makes it worse. Because it is rare, people won't remember to check for it unless the check is an automated default. If I were in a Japanese environment, regularly getting code written in Japanese, then Japanese code would be fine, so I would set my environment to accept Japanese -- but I would still get that warning for something with that appears Latin but actually contains Cyrillic. > * Python is embeddable: and often it is to bring the power of python > to less sophisticated users. You can imagine having a global system > deployed all around the world by a global company enabling each user > in each subsidiary to create their own extension scripts. If they can supply their own scripts, they can supply their own data files -- including an acceptable characters table. But they wouldn't really need to -- realistically, the acceptable characters would be a corporate (or at least site-wide) policy decision that could be set at install time. > * There is a runtime cost for checking: the speed vs. security > tradeoff True, but if speed is that important, than ASCII-only is better; the initial file reading will happen faster, as will the parsing to characters, and the deciding whether characters can be part of an identifier. Even a blind "Anything code point greater than 127 is always allowed" is still slower than not having to consider those code points. Once you start saying "letters and digits only", you need a per-character lookup, and the difference between "in this set of 4000 out of several million" vs "in this set of several million out of several more million" doesn't need to slow things down. > (for a security benefit that is still very much hypothetical > in the face of the experience of Java and .NET people) (a) Aren't those compile languages, rather than interpreted? So a misleadingly-named identifier doesn't matter as much, because people aren't looking at the source anyhow. (b) How do you know there haven't been problems that just weren't caught? (Perhaps more of the "wonder why that errored out" variety than security breaches.) > * In real life, you won't see much python programs that are not > written in your script. Exactly. So when you do, they should be flagged. -jJ From jimjjewett at gmail.com Fri May 25 17:04:24 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 May 2007 11:04:24 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705241913j1d2f60e1ndc89e05bfd926c52@mail.gmail.com> References: <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> <250467.66423.qm@web33502.mail.mud.yahoo.com> <19dd68ba0705241913j1d2f60e1ndc89e05bfd926c52@mail.gmail.com> Message-ID: On 5/24/07, Guillaume Proux wrote: > I have a hard time seeing how you could sniff out the willingness to > accept in a Japanese environment, a piece of code written in Russian > because your buddy from Siberia has written this cool matrix class > that is 30% faster than most but contains a bunch of cyrillic > characters because people are using cyrillic characters for local > variable identifiers (but not module level identifiers). You probably can't sniff that out automatically. What you can do automatically is say "Whoa! unexpected characters! If you're sure that this code is OK, then do XYZ to allow it (and sufficiently similar code) to run from now on." If XYZ is simple enough, that seems a reasonable tradeoff. The matrix class' distribution could even include the sample lines that need to be added to your allowed-chars table, so you can do it automatically at install time, *if* you explicitly indicate that you know this source is using cyrillic, and that it is OK. (In theory, you might want to allow cyrillic only for this file, not for future files; in practice, people that careful can probably be expected to do the extra work of setting up alternate environments.) -jJ From jimjjewett at gmail.com Fri May 25 17:32:25 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 May 2007 11:32:25 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: On 5/24/07, Guido van Rossum wrote: > It doesn't look like any kind of global flag passed to the interpreter > would scale -- once I am using a known trusted contribution that uses > a different character set than mine, I would have to change the global > setting to be more lenient, and the leniency would affect all code I'm > using. Are you still thinking about the single on/off switch? I agree that saying "Japanese identifiers are OK from now on" still shouldn't turn on Cyrillic identifiers. I think the current alternative boils down to some variant of python -idchars allowedchars.txt where allowedchars.txt would look something like 0780..07B1 ; Thaana or 10000..100FA ; Linear_B plus some blanks I was too lazy to exclude (These lines are based on the unicode Scripts.txt, and use character ranges instead of script names so that you can exclude certain symbols if you want to.) -jJ From jimjjewett at gmail.com Fri May 25 17:37:36 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 May 2007 11:37:36 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> References: <20070524213605.864B.JCARLSON@uci.edu> <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> Message-ID: On 5/25/07, Guillaume Proux wrote: > If you look at the typical use case for programs written in python > (usually also in rough order of experience) > A) directly in interpreter (i love that) > B) small-ish one-off scripts > C) middle size scripts > D) multi-module programs made by a single person > E) large-ish programs made by a group of people You're missing "here is this neat code from sourceforge", or "Here is something I cut-and-pasted from ASPN". If those use something outside of ASCII, that's fine -- so long as they tell you about it. If you didn't realize it was using non-ASCII (or even that it could), and the author didn't warn you -- then that is an appropriate time for the interpreter to warn you that things aren't as you expect. -jJ From pje at telecommunity.com Fri May 25 17:54:50 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 25 May 2007 11:54:50 -0400 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: <4656446F.8030802@canterbury.ac.nz> Message-ID: <20070525155302.917F23A4061@sparrow.telecommunity.com> At 11:25 AM 5/25/2007 +0200, Neville Grech Neville Grech wrote: > >From a user's POV, I'm +1 on having overloadable boolean > functions. In many cases I had to resort to overload add or neg > instead of and & not, I foresee a lot of cases where the and > overload could be used to join objects which represent constraints. > Overloadable boolean operators could also be used to implement > other types of logic (eg: fuzzy logic). Constraining them to just > primitive binary operations in my view will be delimiting for a > myriad of use cases. > >Sure, in some cases, one could overload the neg operator instead of >the not but semantically they have different meanings. Actually, I think that most of the use cases for this PEP would be better served by being able to "quote" code, i.e. to create AST objects directly from Python syntax. Then, you can do anything you can do in a Python expression (including conditional expressions, generator expressions, yield expressions, lambdas, etc.) without having to introduce new special methods for any of that stuff. In fact, if new features are added to the language later, they automatically become available in the same way. From foom at fuhm.net Fri May 25 17:54:50 2007 From: foom at fuhm.net (James Y Knight) Date: Fri, 25 May 2007 11:54:50 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <20070524213605.864B.JCARLSON@uci.edu> <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> Message-ID: <2644E17A-207C-4637-AFAC-B5D27063582A@fuhm.net> On May 25, 2007, at 11:37 AM, Jim Jewett wrote: > You're missing "here is this neat code from sourceforge", or "Here is > something I cut-and-pasted from ASPN". If those use something outside > of ASCII, that's fine -- so long as they tell you about it. > > If you didn't realize it was using non-ASCII (or even that it could), > and the author didn't warn you -- then that is an appropriate time for > the interpreter to warn you that things aren't as you expect. Why? If, today, I download a python module (say, from pypi) that does something I need, I don't read the source code, I just import/run it. In the future, why should I even give one whit of concern that a module I download and don't inspect the source code of may use non- ascii characters internally? The answer, for me, is simple: I shouldn't care, and the python interpreter shouldn't force me to care. If I later choose to examine the source code, maybe *then* I care, but that has nothing to do with the python interpreter. James From gproux+py3000 at gmail.com Fri May 25 17:55:23 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Sat, 26 May 2007 00:55:23 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705250854o40a1025cse3d5f2c38cd76785@mail.gmail.com> References: <20070524213605.864B.JCARLSON@uci.edu> <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> <19dd68ba0705250854o40a1025cse3d5f2c38cd76785@mail.gmail.com> Message-ID: <19dd68ba0705250855r6d2676c6r5e9cb7a49b95b6ac@mail.gmail.com> (I mistakenly replied in private. here is a copy for the py3000 mailing list.) Good evening! On 5/26/07, Jim Jewett wrote: > You're missing "here is this neat code from sourceforge", or "Here is > something I cut-and-pasted from ASPN". If those use something outside > of ASCII, that's fine -- so long as they tell you about it. > > If you didn't realize it was using non-ASCII (or even that it could), > and the author didn't warn you -- then that is an appropriate time for > the interpreter to warn you that things aren't as you expect. I fail to see your point. Why should the interpreter warn you? There is nothing wrong to have programs written with identifiers using accented letters, cyrillic alphabet, morse code?! Why should you be warned? If the programmer who wrote the code decided to use its own language to name some of the identifiers ... then.. bygones. If you have an actual requirement that everything should be ascii then do not copy code off ASPN without first sanitizing it and do not copy neat code from sf.net from people you hardly know without doing a full ascii-compliance and security review. but if the code you copy off somewhere else does what you need it to do... then why do you want to force the author of this code generously donated to you to downgrade his expressiveness by having to rewrite all his code to reach ascii purity? Guillaume From stephen at xemacs.org Fri May 25 18:08:48 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 May 2007 01:08:48 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705250653v2c2a8188jac8c4ccc722fb747@mail.gmail.com> References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> <87odk93w09.fsf@uwakimon.sk.tsukuba.ac.jp> <19dd68ba0705250641j348a42adu974fe4969897761e@mail.gmail.com> <19dd68ba0705250653v2c2a8188jac8c4ccc722fb747@mail.gmail.com> Message-ID: <87irag51in.fsf@uwakimon.sk.tsukuba.ac.jp> Guillaume Proux writes: > Am I the only person on Earth to routinely start my python programs by > double clicking on them?? Surely not. So? If your python programs have non-ASCII identifiers in them, they'll crash when you double-click them. So I suspect you have no programs now where there's a problem. And there will be very few for the near future. For the medium term, there are ways to pass command line arguments to programs invoked by GUI. They're more or less ugly, but your daughter will never see them, only the pretty icons. Please be aware that I'm an advocate of the feature, and I would be a bit happier if it were enabled and there was no way to disable it at all. However, this is a community, and some of the members are quite concerned about possible effects on themselves and on the community. I see little harm in providing the feature, and delaying making it default for a while, deferring to their concerns until there is more experience with the feature, and the offline checking programs that several of us have proposed actually exist and have been field-tested. From jcarlson at uci.edu Fri May 25 18:05:07 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 25 May 2007 09:05:07 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <20070524213605.864B.JCARLSON@uci.edu> Message-ID: <20070525084117.865D.JCARLSON@uci.edu> "Guido van Rossum" wrote: > > On 5/24/07, Josiah Carlson wrote: > > Where else in Python have we made the default > > behavior only desired or useful to 5% of our users? > > Where are you getting that statistic? This seems an extremely > backwards, US-centric worldview. Stephen Turnbill's rough statistics on multilingual use in Emacs... """ And that's a big "if". Most of your users will not see code in a language the current version of your editor can't deal with in their working lives, and 90% won't in the usable life of your product. That I can tell you from experience. Emacs has all these wonderful multilingual features, but you know what? 95% of our users are monoscript 100% of the time.[1] 90% of the rest use their primary script 95% of the time. Emacs being multilingual only means that the one language might be Japanese or Thai. If 99% of your users currently use only ISO-8859-15, that isn't going to change by much just because Python now allows Thai identifiers. """ http://mail.python.org/pipermail/python-3000/2007-May/007887.html Which I 'poorly extrapolate' to users who write source using non-ascii identifiers... """ Why? Primarily because ascii identifiers are what are allowed today, and have been allowed for 15 years. But there is this secondary data point that Stephen Turnbull brought up; 95% of users (of Emacs) never touch non-ascii code. Poor extrapolation of statistics aside, to make the default be something that does not help 95% of users seems a bit... overenthusiastic. Where else in Python have we made the default behavior only desired or useful to 5% of our users? """ http://mail.python.org/pipermail/python-3000/2007-May/007927.html Apples and oranges to be sure, but there are no other statistics that anyone else is able to offer about use of non-ascii identifiers in Java, Javascript, C#, etc. - Josiah From jimjjewett at gmail.com Fri May 25 18:03:59 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 May 2007 12:03:59 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/25/07, Adam Olsen wrote: > On 5/23/07, Jim Jewett wrote: > > > ... range of characters and languages allowed ... > > Fair enough -- but the problem is that this isn't a solved issue > > yet; the unicode group themselves make several contradictory > > recommendations. > > I can come up with rules that are probably just about right, but I > > will make mistakes (just as the unicode consortium itself did, > > which is why they have both ID and XID, and why both have > > stability characters). Even having read their reports, my initial > > rules would still have banned mixed-script, which would have > > prevented your edict-example. > If we allowed an underscore as a mixed-script separator > (allowing "def get_??(self):"), does this let us get away > with otherwise banning mixed-scripts? I wondered that, until seeing that it wouldn't really solve the problem anyhow. It is possible to write entire words (such as "allow" or "scope") in multiple scripts. (Unicode calls these "whole script confusables".) You can't stop that without banning one of the scripts entirely, which would disenfranche users of some languages. So I think the least-bad solution is to say "OK, we won't allow these potentially confusable characters unless you were expecting them." And once we have a way to say "I'm expecting Cyrillic", we might as well let the user specify exactly what they're expecting, and make their own decisions on what it likely to be needed vs likely to be confused. For more information, see section 4 of http://www.unicode.org/reports/tr39/ and current likely problem characters at http://www.unicode.org/reports/tr39/data/confusables.txt http://www.unicode.org/reports/tr39/data/confusablesWholeScript.txt -jJ From gproux+py3000 at gmail.com Fri May 25 18:10:02 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Sat, 26 May 2007 01:10:02 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <87irag51in.fsf@uwakimon.sk.tsukuba.ac.jp> References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> <87odk93w09.fsf@uwakimon.sk.tsukuba.ac.jp> <19dd68ba0705250641j348a42adu974fe4969897761e@mail.gmail.com> <19dd68ba0705250653v2c2a8188jac8c4ccc722fb747@mail.gmail.com> <87irag51in.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <19dd68ba0705250910o5b56b4f9i9fccd450e37f48fe@mail.gmail.com> On 5/26/07, Stephen J. Turnbull wrote: > For the medium term, there are ways to pass command line arguments to > programs invoked by GUI. They're more or less ugly, but your daughter > will never see them, only the pretty icons. Is there right now in Windows? There is none that I know today at least. All I know is that specific extensions are called automatically using a given interpreter because of bindin defined in the registry. There is no simple way to add per-file info afaik. Regards, Guillaume From guido at python.org Fri May 25 18:31:13 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 25 May 2007 09:31:13 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: On 5/25/07, Jim Jewett wrote: > On 5/24/07, Guido van Rossum wrote: > > > It doesn't look like any kind of global flag passed to the interpreter > > would scale -- once I am using a known trusted contribution that uses > > a different character set than mine, I would have to change the global > > setting to be more lenient, and the leniency would affect all code I'm > > using. > > Are you still thinking about the single on/off switch? > > I agree that saying "Japanese identifiers are OK from now on" still > shouldn't turn on Cyrillic identifiers. I think the current > alternative boils down to some variant of > > python -idchars allowedchars.txt > > where allowedchars.txt would look something like > > > 0780..07B1 ; Thaana > > or > > 10000..100FA ; Linear_B plus some blanks I was too lazy to exclude > > (These lines are based on the unicode Scripts.txt, and use character > ranges instead of script names so that you can exclude certain symbols > if you want to.) I still think such a command-line switch (or switches) is the wrong approach. What if I have *one* module that uses Cyrillic legitimately. A command-line switch would enable Cyrillic in *all* modules. Auditing code using a separate tool can be much more flexible. Organizations can establish their own conventions for flagging exceptions on a per-module basis. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jcarlson at uci.edu Fri May 25 18:37:50 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 25 May 2007 09:37:50 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705250910o5b56b4f9i9fccd450e37f48fe@mail.gmail.com> References: <87irag51in.fsf@uwakimon.sk.tsukuba.ac.jp> <19dd68ba0705250910o5b56b4f9i9fccd450e37f48fe@mail.gmail.com> Message-ID: <20070525092604.8666.JCARLSON@uci.edu> "Guillaume Proux" wrote: > On 5/26/07, Stephen J. Turnbull wrote: > > For the medium term, there are ways to pass command line arguments to > > programs invoked by GUI. They're more or less ugly, but your daughter > > will never see them, only the pretty icons. > > Is there right now in Windows? There is none that I know today at > least. All I know is that specific extensions are called automatically > using a given interpreter because of bindin defined in the registry. > There is no simple way to add per-file info afaik. I thought you didn't care what identifiers were in your source? Wouldn't you have already changed your environment to automatically include all of unicode in the allowable identifiers? But if you really want to muck about with the command line to each script individually, you can create a shortcut and add 'python ' to the beginning of the command line. Or, if you want a semi-automatic solution, you can change the command line to Python to a batch file that automatically generates a either a shortcut or a batch file for each .py file that is run, which can then be edited either using the properties dialog (for shortcuts) or any text editor (for batch files) to change the command line options to Python. You may be able to use the shortcuts automatically generated and placed into your 'Documents and Settings\\Recent' path, but I haven't tested this. - Josiah From jcarlson at uci.edu Fri May 25 18:41:53 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 25 May 2007 09:41:53 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4656920F.9040001@v.loewis.de> References: <20070524234516.8654.JCARLSON@uci.edu> <4656920F.9040001@v.loewis.de> Message-ID: <20070525091105.8663.JCARLSON@uci.edu> "Martin v. L?wis" wrote: > > >> People should not have to read long system configuration pages > >> just to run the program that they intuitively wrote correctly > >> right from the start. > > > > You mean that 5% of users who run into code written using non-ascii > > identifiers will find this sufficiently burdensome to force the 95% of > > ascii users to use additional verification and checking tools to make > > sure that they are not confronted with non-ascii identifiers? I don't > > find that a reasonable tradeoff for the majority of (non-unicode) users. > > I think I lost track of what problem you are trying to solve: is it > the security issue, or is the the problem Ping stated ("you cannot > know the full lexical rules by heart anymore"). > > If it is the latter, I don't understand why the 95% ascii users need > to run additional verification and checking tools. If they don't > know the full language, they won't use it - why should they run > any checking tools? Say that I have an ascii codebase that I've been happily using (and I have been getting warnings/errors/whatever whenever non-ascii code is found during runtime, so I know it is pure). But I want to use a 3rd party package that offers additional functionality*. I drop this package into my tree, add the necessary imports and... ImportError: non-ascii identifier used without -U option Huh, apparently this 3rd party package uses non-ascii identifiers. If I wanted to keep my codebase ascii-only (a not unlikely case), I can choose to either look for a different package, look for a variant of this package with only ascii identifiers, or attempt to convert the package myself (a tool that does the unicode -> ascii transliteration process would make this smoother). For those who don't care about ascii or non-ascii identifiers, they will likely already have an environment variable or site.py modification that offers all unicode characters that they want, and they will never see this message. > If it is the security issue, I don't see why a warning wouldn't > address the concerns of these users just as well. It's partially a security issue, but that's only 1 of the 5 reasons that Ka-Ping pointed out. But yes, I want to see a message and I want the software to halt and tell me that it found something that may be an issue. And I want this to *automatically* happen every time I run Python - Josiah * Or I copy and paste code from the Python Cookbook, a blog, etc. From gproux+py3000 at gmail.com Fri May 25 18:45:35 2007 From: gproux+py3000 at gmail.com (Guillaume Proux) Date: Sat, 26 May 2007 01:45:35 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <20070525091105.8663.JCARLSON@uci.edu> References: <20070524234516.8654.JCARLSON@uci.edu> <4656920F.9040001@v.loewis.de> <20070525091105.8663.JCARLSON@uci.edu> Message-ID: <19dd68ba0705250945j3dadcefcu8db91b3d2c055fdf@mail.gmail.com> On 5/26/07, Josiah Carlson wrote: > wanted to keep my codebase ascii-only (a not unlikely case), I can So you have a clear preference for an ascii-only way. *YOU* *really* want to know when a non-ascii identifier crosses your path. > For those who don't care about ascii or non-ascii identifiers, they will > likely already have an environment variable or site.py modification that > offers all unicode characters that they want, and they will never see > this message. I will rephrase your sentence this way. "For those who DO care about ascii only identifiers, they will likely have already an environment variable or site.py modifcation that makes sure that all code ever imported is pure ascii and are going to see the message they want to see..." > issue. And I want this to *automatically* happen every time I run > Python "and automatically every time they run Python"... This argument cuts both ways. Guillaume From stephen at xemacs.org Fri May 25 18:59:33 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 May 2007 01:59:33 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705250910o5b56b4f9i9fccd450e37f48fe@mail.gmail.com> References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> <87odk93w09.fsf@uwakimon.sk.tsukuba.ac.jp> <19dd68ba0705250641j348a42adu974fe4969897761e@mail.gmail.com> <19dd68ba0705250653v2c2a8188jac8c4ccc722fb747@mail.gmail.com> <87irag51in.fsf@uwakimon.sk.tsukuba.ac.jp> <19dd68ba0705250910o5b56b4f9i9fccd450e37f48fe@mail.gmail.com> Message-ID: <87fy5k4z62.fsf@uwakimon.sk.tsukuba.ac.jp> Guillaume Proux writes: > Is there [a way to pass options to GUI programs] right now in > Windows? There is none that I know today at least. Can't you click on .BAT files? (I did say "ugly"!) From jimjjewett at gmail.com Fri May 25 19:04:31 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 May 2007 13:04:31 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <740c3aec0705250255k642d6637re46e3929212f1369@mail.gmail.com> References: <465667AE.2090000@v.loewis.de> <20070524215742.864E.JCARLSON@uci.edu> <740c3aec0705250255k642d6637re46e3929212f1369@mail.gmail.com> Message-ID: On 5/25/07, BJ?rn Lindqvist wrote: > I think you are forgetting who this feature is intended for. [I think experienced programmers will in fact use it too, but agree that ...] > Newbies, on the other hand, would maybe appreciate being able to write: ... > If Python required a switch for such a program to run, then this > feature would be totally wasted on them. They might use an IDE, > program in notepad.exe and dragging the file to the python.exe icon or > not even know about cmd.exe or what a command line switch is. An error > message, even an informal one, isn't easy to understand if you don't > know English. How about a default file, such as "on launch, python looks for pyidchar.txt ... if you want to override this default file do XYZ" The default default file would be empty (except for comments explaining the syntax) and allow only ASCII. A Swedish volunteer could create and distribute a version for Swedish characters. (And since these would be fairly small text files, some could probably be distributed right in the primary distribution.) What the teacher installs python, she just uses the Swedish distribution, or picks the "also allow Latin1 IDs" option from the custom MSI install. -jJ From tjreedy at udel.edu Fri May 25 19:19:52 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 25 May 2007 13:19:52 -0400 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? References: <4656446F.8030802@canterbury.ac.nz> Message-ID: "Greg Ewing" wrote in message news:4656446F.8030802 at canterbury.ac.nz... | Guido van Rossum wrote: | | > Last call for discussion! I'm tempted to reject this -- the ability to | > generate optimized code based on the shortcut semantics of and/or is | > pretty important to me. | | Please don't be hasty. I've had to think about this issue | a bit. I have not seen any response to my suggestion to simplify the to-me overly baroque semantics. Missed it? Still thinking? Or did I miss something? Terry Jan Reedy From jimjjewett at gmail.com Fri May 25 19:47:44 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 May 2007 13:47:44 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/25/07, Stephen J. Turnbull wrote: > Jim Jewett writes: > > Ideally, it would even be explicit per extra character allowed, though > > there should obviously be shortcuts to accept entire scripts. > How about a regexp character class as starting point? I'm not sure I understand. Do you mean that part of localization should be defining what certain regular expressions should match? That sounds great from a consistency standpoint, but it would certainly limit who could create their own reliable tailorings. > > So how about > > [ ASCII, plus chars in a named table] > You can specify any character you want, but if it's ASCII, or not in > the classes PEP 3131 ends up using to define the maximal set, it gets > deleted from the extension table (ASCII has its own table, > conceptually). This permits whole scripts, blocks, or ranges to be > included. So long as we allow tailoring, I think the maximal set should be generous -- and I don't see any reason to pre-exclude anything outside ASCII. There are people who like to use names like "Program Files" or "Summary of Results.Apr-3-2007 version 2.xls"; I expect the same will be true of identifiers. So long as the punctuation is not ASCII, we might as well let them. (Internally, I expect some communities to say "that is a bad idea" about certain characters, but *I* don't want to prejudge which characters those will be.) > > If you want to include punctuation or > Why waste the effort of the Unicode technical committees? The other committees say to exclude certain scripts, like Linear B and Ogham. And not to allow mixed scripts, at least if they're confusable. But I really don't want to explain why someone using Cyrillic can't use certain (apparently to him) randomly determined identifiers just because it could be confused with ASCII (or Armenian). The only set the committees always recommend allowing is ASCII; beyond that a nest of decisions (and exceptions) is almost unavoidable, because the committees disagree among themselves. Since we can't be completely safe, I would rather err on the side of leniency towards those concerned enough to make explicit decisions. > > undefined characters, so be it. > -1 > Assuming undefined == reserved for future standardization that > violates the Unicode standard. If unicode comes out with a new revision, the new characters should probably be allowed; I don't want a situation where users of Cham or Lepcha[1] are told they have to wait another year because their scripts weren't formally adopted into unicode until after python 3.4.0 was already released. [1] http://www.unicode.org/onlinedat/languages-scripts.html says that these languages have their own scripts (and no alternate script), and that these scripts have not yet been encoded in unicode. I won't be surprised to see Klingon identifiers before we see either of those, but ... I don't want to contribute to their exclusion. -jJ From jimjjewett at gmail.com Fri May 25 19:59:48 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 May 2007 13:59:48 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <857617.19874.qm@web33514.mail.mud.yahoo.com> References: <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> <857617.19874.qm@web33514.mail.mud.yahoo.com> Message-ID: On 5/25/07, Steve Howell wrote: > This happens to me about once a month, and I > forget exactly what Python does when I try to run the > program where one identifier has the accented e, and a > later identifier doesn't. It *should* throw up a syntax error. If both letters were valid, it would silently create a second identifier, and you would have some fun tracking down the bug. I say "*should*" because, at the moment, it seems to accept some additional characters, in at least some environments. In particular, using Idle from 2.5.0, I just noticed that I can apparently use at least some Latin-1 characters. >>> ? = 5 >>> print ? 5 >>> ?=7 SyntaxError: invalid syntax >>> ?=7 >>> ? 7 [And no, this doesn't mean "it's already in use; no big deal", because the Latin-1 characters are not the biggest concern.] -jJ From rhamph at gmail.com Fri May 25 20:16:46 2007 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 25 May 2007 12:16:46 -0600 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/25/07, Jim Jewett wrote: > On 5/25/07, Adam Olsen wrote: > > If we allowed an underscore as a mixed-script separator > > (allowing "def get_??(self):"), does this let us get away > > with otherwise banning mixed-scripts? > > I wondered that, until seeing that it wouldn't really solve the > problem anyhow. It is possible to write entire words (such as "allow" > or "scope") in multiple scripts. (Unicode calls these "whole script > confusables".) You can't stop that without banning one of the scripts > entirely, which would disenfranche users of some languages. > > So I think the least-bad solution is to say "OK, we won't allow these > potentially confusable characters unless you were expecting them." > > And once we have a way to say "I'm expecting Cyrillic", we might as > well let the user specify exactly what they're expecting, and make > their own decisions on what it likely to be needed vs likely to be > confused. Indeed, the whole-script confusables does create significant holes, but I think the best solution is still to ban mixed-scripts and accept that it's only a "75% solution". Using an "I'm expecting cyrillic" flag makes it harder for those who need cyrillic AND still leaves them vulnerable to the same problem we're trying to protect ourselves from. A more extreme solution would be to introduce a symbol type that converts that converts whole-script confusables to a canonical form (as well as mixed-script confusables, if we don't ban them). For practically it would have to coerce any unicode it was compared with for equality.. and probably not support sorting. -- Adam Olsen, aka Rhamphoryncus From jimjjewett at gmail.com Fri May 25 20:20:50 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 May 2007 14:20:50 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705250855r6d2676c6r5e9cb7a49b95b6ac@mail.gmail.com> References: <20070524213605.864B.JCARLSON@uci.edu> <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> <19dd68ba0705250854o40a1025cse3d5f2c38cd76785@mail.gmail.com> <19dd68ba0705250855r6d2676c6r5e9cb7a49b95b6ac@mail.gmail.com> Message-ID: On 5/25/07, Guillaume Proux wrote: > On 5/26/07, Jim Jewett wrote: > > You're missing "here is this neat code from sourceforge", or "Here is > > something I cut-and-pasted from ASPN". If those use something outside > > of ASCII, that's fine -- so long as they tell you about it. > > If you didn't realize it was using non-ASCII (or even that it could), > > and the author didn't warn you -- then that is an appropriate time for > > the interpreter to warn you that things aren't as you expect. > I fail to see your point. Why should the interpreter warn you? I see some of the confusion now; as James Knight pointed out, some people already treat python as binary code, and just run without reading -- but some people don't. I do read (or at least skim) other people's code before running it. If nothing else, I want to see whether it has much chance of solving my actual problem. By the time I've finished reading it, I have a fairly good idea what it is doing. That's less true if I can't read everything, but at least I know which parts to worry about. Arbitrary unicode identifier opens up the possibility of code that *looks* like ASCII, but isn't -- so I don't even realize that I missed something. > but if the code you copy off somewhere else does what you need it to > do... then why do you want to force the author of this code generously > donated to you to downgrade his expressiveness by having to rewrite > all his code to reach ascii purity? I don't mind that he used Sanskrit identifiers; I don't even mind if he uses Cyrillic identifiers that look like ASCII. I'll be less likely to use his code, but that is my own problem. If his code breaks when retyped ... again, that is mostly my own problem. What I do mind is if he used identifier characters that look like > or ', and I didn't notice because the rest of the code was ASCII, and python didn't warn me, because, hey, technically, those lookalikes *are* letters now. -jJ From jimjjewett at gmail.com Fri May 25 20:38:46 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 May 2007 14:38:46 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: On 5/25/07, Guido van Rossum wrote: > On 5/25/07, Jim Jewett wrote: > > I agree that saying "Japanese identifiers are OK from now on" still > > shouldn't turn on Cyrillic identifiers. I think the current > > alternative boils down to some variant of > > where allowedchars.txt would look something like > > 0780..07B1 ; Thaana > > or > > 10000..100FA ; Linear_B plus some blanks I was too lazy to exclude > I still think such a command-line switch (or switches) is the wrong > approach. What if I have *one* module that uses Cyrillic legitimately. > A command-line switch would enable Cyrillic in *all* modules. Yes. And that is the desired outcome for a student situation. > ... Auditing code using a separate tool can ... Large organizations can do whatever they need to, including an automated transliteration before import. The concern is for relatively small groups, who don't have huge processes in place. (1) A new student shouldn't need to learn about import flags just to use native characters. Giving such fine-grained control as an advanced option is OK, but it shouldn't be the *only* way to say "ASCII + characters I use when reading or writing." (2) Someone downloading source code (not binary, source code) shouldn't have to remember to run that code through an external tool just to see if it uses unexpected characters (and might be saying something very different from what she expected). Note that this applies even to people who do want the extended identifiers; wanting to write Han Chinese characters does not imply wanting to accept Greek Coptic characters. -jJ From jimjjewett at gmail.com Fri May 25 20:49:55 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 25 May 2007 14:49:55 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/25/07, Adam Olsen wrote: > On 5/25/07, Jim Jewett wrote: > > On 5/25/07, Adam Olsen wrote: > > > If we allowed an underscore as a mixed-script separator > > > (allowing "def get_??(self):"), does this let us get away > > > with otherwise banning mixed-scripts? ... > Indeed, the whole-script confusables does create significant > holes, but I think the best solution is still to ban mixed-scripts > and accept that it's only a "75% solution". Using an "I'm > expecting cyrillic" flag makes it harder for those who need > cyrillic AND still leaves them vulnerable to the same problem > we're trying to protect ourselves from. hmm... I had thought they should either not include the confusable letters, or use different fonts -- whatever they normally do. But I suppose using an _ separator could still be a useful crutch. Whether it is useful enough ... I'll let others chime in. > A more extreme solution would be to introduce a symbol type that > converts that converts whole-script confusables to a canonical > form The unicode consortium recommends against this. I'm not sure if it is just a presentation issue, or concerns about compatibility; the "confusables" lists are explicitly allowed to change. -jJ From python at zesty.ca Fri May 25 21:29:50 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Fri, 25 May 2007 14:29:50 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <20070525084117.865D.JCARLSON@uci.edu> References: <20070524213605.864B.JCARLSON@uci.edu> <20070525084117.865D.JCARLSON@uci.edu> Message-ID: On Fri, 25 May 2007, Josiah Carlson wrote: > Apples and oranges to be sure, but there are no other statistics that > anyone else is able to offer about use of non-ascii identifiers in Java, > Javascript, C#, etc. Let's see what we can find. I made several attempts to search for non-ASCII identifiers using google.com/codesearch and here's what I got. Java or JavaScript (total: about 1480000 files found with "lang:java .") ------------------------------------------------------------------------ 1. lang:java ^[^"]*[^\s!-~].*= (assignment to non-ASCII name) 2 files with a UTF-8 BOM at the beginning; 1 file with non-ASCII in comments; 5 files with non-ASCII in strings; 2 files with non-ASCII elsewhere in source code: 1. moin-1.5.8/wiki/htdocs/applets/moinFCKplugins/.../lang/en.js UTF-8 BOM in middle of file. 2. SMSkyline.wdgt/fr.lproj/localizedStrings.js UTF-16 BOM beginning of a UTF-8 file. (!) 2. lang:java ^[^"]*[^\s!-~]\w*\. (method call on non-ASCII name) 2 files with a UTF-8 BOM at the beginning; 13 files with non-ASCII in comments; 5 files with non-ASCII in strings; 5 files with non-ASCII elsewhere in source code: 1. struts-2.0.6/src/core/src/.../Editor2Plugin/FindReplaceDialog.js UTF-8 BOM in middle of file. 2. moin-1.5.8/wiki/htdocs/applets/moinFCKplugins/.../lang/en.js UTF-8 BOM in middle of file. 3. chickenfoot/chickenscratch/tests/findTest.js Non-breaking spaces embedded in indentation. 3. lang:java ^\s*class.*[^\s!-~] (class declaration) 2 files with non-ASCII in strings; no other hits. 4. lang:javascript ^\s*function.*[^\s!-~] (function declaration) 1 non-JavaScript file; 9 files with non-ASCII in comments; 1 file with non-ASCII in strings; 1 file with non-ASCII elsewhere in source code: 1. google_hacks_3E_code/hack_61/zoom-google.user.js Thin spaces (U+2009) embedded in code. C# (total: about 266000 files found with "lang:c# .") ----------------------------------------------------- 5. lang:c# ^[^"]*[^\s!-~].*= (assignment to non-ASCII name) 5 non-C# files; 6 files with a UTF-8 BOM at the beginning; 9 files with non-ASCII in comments; 7 files with non-ASCII elsewhere in source code: 1. blam-1.8.4pre2/src/PreferencesDialog.cs Non-breaking spaces in the middle of the line. 2. BildschirmTennis2/BildschirmTennis2/Program1.cs Identifier containing non-ASCII. 3. Ukazkova reseni CS - Prakticke priklady/.../Exp_2_03/Class2.cs Identifier containing non-ASCII. 4. Rule.cs Identifier containing non-ASCII. 5. SharpIntroduction/ComplexExample/Zv?????tko.cs Identifier containing non-ASCII. 6. WitherwynWebDist/Witherwyn/Map.cs "Times" character in expression, probably a typo. 7. PDFsharp/XGraphicsLab/MainForm.cs Identifier containing non-ASCII. 6. lang:c# ^[^"]*[^\s!-~]\w*\( (function call on non-ASCII name) 4 files with non-ASCII in comments; 6 files with non-ASCII elsewhere in source code: 1. BildschirmTennis2/BildschirmTennis2/Program1.cs Identifier containing non-ASCII. 2. SharpIntroduction/ComplexExample/Program.cs Identifier containing non-ASCII. 3. Ukazkova reseni CS - Prakticke priklady/.../Exp_2_03/Class1.cs Identifier containing non-ASCII. 4. ActiveRecord/Generator/.../RelationshipBuilderTestCase.cs Identifier containing non-ASCII, almost certainly a typo. 5. Sample1/Sample1/Program.cs Identifier containing non-ASCII. 6. Kap11/03/TEXT.CS Identifier containing non-ASCII. 7. lang:c# ^\s*class.*[^\s!-~] (class declaration) 1 hit: 1. Kap06/03/Kalen.cs Identifier containing non-ASCII. In summary, that means out of around 5.7 million Java, JavaScript, and C# files that are indexed by Google Code Search, the only use of non-ASCII identifiers I could find was in 12 C# files, and one of those 12 occurrences is almost certainly a mistake. -- ?!ng From mike.klaas at gmail.com Fri May 25 22:16:58 2007 From: mike.klaas at gmail.com (Mike Klaas) Date: Fri, 25 May 2007 13:16:58 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <788141.82125.qm@web33507.mail.mud.yahoo.com> References: <788141.82125.qm@web33507.mail.mud.yahoo.com> Message-ID: On 25-May-07, at 6:03 AM, Steve Howell wrote: > > We're just disagreeing about whether the Dutch tax law > programmer has to uglify his environment with an alias > of Python to "python3.0 -liberal_unicode," or whether > the American programmer in an enterprisy environment > has to uglify his environment with an alias of Python > to "python3.0 -parochial" to mollify his security > auditors. Surely if such mollification were necessary, -parochial would be routinely used for (most much enterprise-y) java? I have never seen any such thing done, though my experience is perhaps not universal. Then again, perhaps the security auditors would object to the use of python in the first place . -Mike From baptiste13 at altern.org Fri May 25 22:42:53 2007 From: baptiste13 at altern.org (Baptiste Carvello) Date: Fri, 25 May 2007 22:42:53 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <46568116.202@v.loewis.de> References: <465667AE.2090000@v.loewis.de> <20070524215742.864E.JCARLSON@uci.edu> <46568116.202@v.loewis.de> Message-ID: Martin v. L?wis a ?crit : > > I don't think there is precedence in Python for such an informational > error message. It is not pythonic to give an error in the case > "I know what you want, and I could easily do it, but I don't feel > like doing it, read these ten pages of text to learn more about the > problem". > in one word: exit From baptiste13 at altern.org Sat May 26 00:00:43 2007 From: baptiste13 at altern.org (Baptiste Carvello) Date: Sat, 26 May 2007 00:00:43 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> Message-ID: Guido van Rossum a ?crit : > > If there's a security argument to be made for restricting the alphabet > used by code contributions (even by co-workers at the same company), I > don't see why ASCII-only projects should have it easier than projects > in other cultures. > there is only one valid reason: because that's the reasonable choice for open source code, and you make the political choice to favor open-source. An ASCII-only default helps open source projects keep their codebase readable, and also makes it easier to open proprietary codebases after the fact. On the other hand, a non-ASCII default does help novice users. So you will make someone unhappy... My personal data point: in scientific research, where I work, specialized programs are sometimes not organised by projects, but by codes, which are developped in-house and open-sourced *as is* after the fact. For this use case, a non-ASCII default is clearly a nuisance, because non-ASCII identifiers would be used without much thought when the program is a small in-house project, and then make it difficult to debug 5 years down the road when it has become important for the community. In this peculiar case, non-ASCII identifiers also have less justification, because all researchers understand english well anyway. So, for my personal interests, an ASCII-only default would be better. just my 2 cents, BC From timothy.c.delaney at gmail.com Sat May 26 00:16:35 2007 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Sat, 26 May 2007 08:16:35 +1000 Subject: [Python-3000] Fw: [Python-Dev] PEP 367: New Super Message-ID: <002301c79f1a$5dc535c0$0201a8c0@mshome.net> Bah - this should have gone to Pyton-3000 too, since it's discussing the PEP. Tim Delaney Tim Delaney wrote: > Guido van Rossum wrote: > >> - This seems to be written from the POV of introducing it in 2.6. >> Perhaps the PEP could be slightly simpler if it could focus just on >> Py3k? Then it's up to the 2.6 release managers to decide if and how >> to backport it. > > That was my original intention, but it was assigned a non-Py3k PEP > number, so I presumed I'd missed an email where you'd decided it > should be for 2.6. > We should probably change the PEP number if it's to be targetted at > Py3K only. > >> - Why not make super a keyword, instead of just prohibiting >> assignment to it? (I'm planning to do the same with None BTW in Py3k >> -- I find the "it's a name but you can't assign to it" a rather >> silly business and hardly "the simplest solution".) > > That's currently an open issue - I'm happy to make it a keyword - in > which case I think the title should be changed to "super as a > keyword" or something like that. > >> - "Calling a static method or normal function that accesses the name >> super will raise a TypeError at runtime." This seems too vague. What >> if the function is nested within a method? Taking the specification >> literally, a nested function using super will have its own preamble >> setting super, which would be useless and wrong. > > I'd thought I'd covered that with "This name behaves > identically to a normal local, including use by inner functions via a > cell, with the following exceptions:", but re-reading it it's a bit > clumsy. > The intention is that functions that do not have access to a 'super' > cell variable will raise a TypeError. Only methods using the keyword > 'super' will have a preamble. > > Th preamble will only be added to functions/methods that cause the > 'super' cell to exist i.e. for CPython have 'super' in co.cellvars. > Functions that just have 'super' in co.freevars wouldn't have the > preamble. >> - "For static methods and normal functions, will be None, >> resulting in a TypeError being raised during the preamble." How do >> you know you're in this situation at run time? By the time the >> function body is entered the knowledge about whether this was a >> static or instance method is lost. > > The preamble will not technically be part of the function body - it > occurs after unpacking the parameters, but before entering the > function body, and has access to the C-level variables of the > function/method object. So the exception will be raised before > entering the function body. > The way I see it, during class construction, a C-level variable on the > method object would be bound to the (decorated?) class. This really > needs to be done as the last step in class construction if it's to > bind to the decorated class - otherwise it can be done as the methods > are processed. > I was thinking that by binding that variable to Py_None for static > methods it would allow someone to do the following: > > def modulefunc(self): > pass > > class A(object): > def func(self): > pass > > @staticmethod > def staticfunc(): > pass > > class B(object): > func = A.func > staticfunc = A.staticfunc > outerfunc = modulefunc > > class C(object): > outerfunc = B.outerfunc > > but that's already going to cause problems when you call the methods > - they will be being called with instances of the wrong type (raising > a TypeError). > So now I think both static methods and functions should just have that > variable left as NULL. Trying to get __super__(NULL) will throw a > TypeError. >> - The reference implementation (by virtue of its bytecode hacking) >> only applies to CPython. (I'll have to study it in more detail >> later.) > > Yep, and it has quite a few limitations. I'd really like to split it > out from the PEP itself, but I'm not sure where I should host it. > >> I'll probably come up with more detailed feedback later. Keep up the >> good work!! > > Now I've got to find the time to try implementing it. Neal has said > he's willing to help, but I want to give it a go myself. > > Tim Delaney From baptiste13 at altern.org Sat May 26 00:14:51 2007 From: baptiste13 at altern.org (Baptiste Carvello) Date: Sat, 26 May 2007 00:14:51 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> References: <20070524213605.864B.JCARLSON@uci.edu> <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> Message-ID: Guillaume Proux a ?crit : > I think Martin's and my point is that to get people to level E) there > is no reason to put any charset restriction on level A ->D. And when > you are at level E), it is difficult to argue that making a one-time > test at source code checkin time is a bad practice. > you seem to believe that all useful open source code in the world is written as part of a well organised project that makes use of all known good practices. This is simply not true. In my field (research in physics), open source code sometimes means somebody's in-house tool that he put on the internet at the end of his PhD. This means no support, little documentation, and definitely no "tests at source code checkin time". Still, it can be the best tool in its specialized field. And I want to be able to debug it if needed. just my 2 cents, BC From baptiste13 at altern.org Sat May 26 00:22:35 2007 From: baptiste13 at altern.org (Baptiste Carvello) Date: Sat, 26 May 2007 00:22:35 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705250855r6d2676c6r5e9cb7a49b95b6ac@mail.gmail.com> References: <20070524213605.864B.JCARLSON@uci.edu> <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> <19dd68ba0705250854o40a1025cse3d5f2c38cd76785@mail.gmail.com> <19dd68ba0705250855r6d2676c6r5e9cb7a49b95b6ac@mail.gmail.com> Message-ID: Guillaume Proux a ?crit : > (I mistakenly replied in private. here is a copy for the py3000 mailing list.) > > > Good evening! > > On 5/26/07, Jim Jewett wrote: >> You're missing "here is this neat code from sourceforge", or "Here is >> something I cut-and-pasted from ASPN". If those use something outside >> of ASCII, that's fine -- so long as they tell you about it. >> >> If you didn't realize it was using non-ASCII (or even that it could), >> and the author didn't warn you -- then that is an appropriate time for >> the interpreter to warn you that things aren't as you expect. > > I fail to see your point. Why should the interpreter warn you? > > There is nothing wrong to have programs written with identifiers using > accented letters, cyrillic alphabet, morse code?! Why should you be > warned? If the programmer who wrote the code decided to use its own > language to name some of the identifiers ... then.. bygones. > sure, until you hit some bug and would like to debug it, and you can't even recognise the identifiers from one another... > If you have an actual requirement that everything should be ascii > then do not copy code off ASPN without first sanitizing it and do not > copy neat code from sf.net from people you hardly know without doing a > full ascii-compliance and security review. > > but if the code you copy off somewhere else does what you need it to > do... then why do you want to force the author of this code generously > donated to you to downgrade his expressiveness by having to rewrite > all his code to reach ascii purity? > don't make it sound so dramatic. Python programmers already accept limits on expressiveness in the name of readability. Heck, otherwise we would all be using Perl. BC From python at zesty.ca Sat May 26 00:45:18 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Fri, 25 May 2007 17:45:18 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <877iqy5v2y.fsf@uwakimon.sk.tsukuba.ac.jp> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <877iqy5v2y.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, 24 May 2007, Stephen J. Turnbull wrote: > > You've got this backwards, and I suspect that's part of the root of > > the disagreement. It's not that "when humans enter the loop they > > cause problems." The purpose of the language is to *serve humans*. [...] > N.B. I take offense at your misquote. *Humans do not cause problems.* > It is *non-ASCII tokens* that *cause* the (putative) problem. However, > the alleged problems only arise when humans are present. Oh, I apologize. I misunderstood the antecedent of "they". > > The grammar has to be something a human can understand. > > There are an infinite number of ASCII-only Python tokens. Whether > those tokens are lexically composed of a small fixed finite alphabet > vs. a large extensible finite alphabet doesn't change anything in > terms of understanding the *grammar*. I understand that you're talking about grammar as distinct from lexical syntax -- I was using the word "grammar" to refer to everything. I probably should have used the word "syntax" instead. My point was just that you have to be able to tell what a token is before you can read the syntax. That's hard to do if you don't know what characters are allowed and what characters aren't (and if there isn't even a consensus on what should be allowed). > The question is how expensive will the upgrade be, and what are the > benefits. My experience suggests that the cost is negligible *because > most users won't use non-ASCII identifiers*, and they'll just stick > with their ASCII-only tools. That's exactly the danger. It's a change that makes almost everyone's tools and practices subtly, occasionally, and silently incorrect -- even unconsciously incorrect for many. That's much worse than a change that is obvious enough to force a correction in assumptions. That just means, if we're going to provide this feature, we shouldn't force subtle wrongness upon people by making it the default. The balance you're talking about weighs heavily in favour of ASCII by default because that is what 100% of Python programs use now, it is what the vast majority of Python programs will use in the future, and it is what the vast majority of Python users will assume to be the case for quite some time. > And there are cases (Dutch tax law, Japanese morphology) where having > a judicious selection of non-ASCII identifiers is very convenient. Yes, granted. > > This should be built in to the Python interpreter and on by default, > > unless it is turned off by a command-line switch that says "I want to > > allow the full set of Unicode identifier characters in identifiers." > > I'd make it more tedious and more flexible to relax the restriction, > actually. "python" gives you the stdlib, ASCII-only restriction. > "python -U TABLE" takes a mandatory argument, which is the table of > allowed characters. If you want to rule out "stupid file substitution > tricks", TABLE could take the special arguments "stdlib" and "stduni" > which refer to built-in tables. But people really should be able to > restrict to "Japanese joyo kanji, kana, and ASCII only" or "IBM > Japanese only" as local standards demand, so -U should also be able to > take a file name, or a module name, or something like that. I strongly support this idea. It's the best proposal I've heard so far. > > If we are going to allow Unicode identifiers at all, then I would > > recommend only allowing identifiers that are already normalized > > (in NFC). > > Already in the PEP. The PEP says that Python will *convert* the identifiers into NFC. I'd rather there not be lots of different ways to write the same identifier (TOOWTDI), so this particular recommendation is that identifiers in source code have to already be normalized. > > The ideas that I'm in favour of include: > > > > (e) Use a character set that is fixed over time. > > The BASIC that I learned first only had 26 user identifiers. Maybe > that's the way we should go? The solution you propose solves this nicely. -- ?!ng From jcarlson at uci.edu Sat May 26 01:16:28 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Fri, 25 May 2007 16:16:28 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705250945j3dadcefcu8db91b3d2c055fdf@mail.gmail.com> References: <20070525091105.8663.JCARLSON@uci.edu> <19dd68ba0705250945j3dadcefcu8db91b3d2c055fdf@mail.gmail.com> Message-ID: <20070525095511.866D.JCARLSON@uci.edu> "Guillaume Proux" wrote: > > On 5/26/07, Josiah Carlson wrote: > > wanted to keep my codebase ascii-only (a not unlikely case), I can > > So you have a clear preference for an ascii-only way. *YOU* *really* > want to know when a non-ascii identifier crosses your path. > > > For those who don't care about ascii or non-ascii identifiers, they will > > likely already have an environment variable or site.py modification that > > offers all unicode characters that they want, and they will never see > > this message. > > I will rephrase your sentence this way. > "For those who DO care about ascii only identifiers, they will likely > have already an > environment variable or site.py modifcation that makes sure that all code ever > imported is pure ascii and are going to see the message they want to see..." > > > issue. And I want this to *automatically* happen every time I run > > Python > > "and automatically every time they run Python"... > > This argument cuts both ways. It does, but it also refuses the temptation to guess that *everyone* wants to use unicode identifiers by default. Why? As Stephen Turnbull has already stated, the majority of users will have *no use* and *no exposure* to unicode identifiers. Further, unicode identifiers may very well break toolchains, so signaling as soon as possible that "there may be something you didn't expect here" is the right thing to do. Baptiste Carvello, in addition to Jim, Ka-Ping, Stephen, and myself, further discusses why ascii is the only sane default in his most recent 3 posts. - Josiah From guido at python.org Sat May 26 01:13:17 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 25 May 2007 16:13:17 -0700 Subject: [Python-3000] [Python-Dev] PEP 367: New Super In-Reply-To: <001d01c79f15$f0afa140$0201a8c0@mshome.net> References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <017d01c79e98$c6b84090$0201a8c0@mshome.net> <001d01c79f15$f0afa140$0201a8c0@mshome.net> Message-ID: On 5/25/07, Tim Delaney wrote: > Bah - this should have gone to Pyton-3000 too, since it's discussing the > PEP. My fault; I started sending you feedback that only went to you, Calvin and the PEP editors. I've added python-3000 at python.org back here. > Guido van Rossum wrote: > > > - This seems to be written from the POV of introducing it in 2.6. > > Perhaps the PEP could be slightly simpler if it could focus just on > > Py3k? Then it's up to the 2.6 release managers to decide if and how to > > backport it. > > That was my original intention, but it was assigned a non-Py3k PEP number, > so I presumed I'd missed an email where you'd decided it should be for 2.6. > > We should probably change the PEP number if it's to be targetted at Py3K > only. Maybe. There are a bunch of PEPs that were originally proposed before the Py3k work started but that are now slated for inclusion in 3.0. I don't think we should renumber all of those. > > - Why not make super a keyword, instead of just prohibiting assignment > > to it? (I'm planning to do the same with None BTW in Py3k -- I find > > the "it's a name but you can't assign to it" a rather silly business > > and hardly "the simplest solution".) > > That's currently an open issue - I'm happy to make it a keyword - in which > case I think the title should be changed to "super as a keyword" or > something like that. As it was before. :-) What's the argument against? > > - "Calling a static method or normal function that accesses the name > > super will raise a TypeError at runtime." This seems too vague. What > > if the function is nested within a method? Taking the specification > > literally, a nested function using super will have its own preamble > > setting super, which would be useless and wrong. > > I'd thought I'd covered that with "This name behaves > identically to a normal local, including use by inner functions via a cell, > with the following exceptions:", but re-reading it it's a bit clumsy. > > The intention is that functions that do not have access to a 'super' cell > variable will raise a TypeError. Only methods using the keyword 'super' will > have a preamble. > > Th preamble will only be added to functions/methods that cause the 'super' > cell to exist i.e. for CPython have 'super' in co.cellvars. Functions that > just have 'super' in co.freevars wouldn't have the preamble. I think it's still too vague. For example: class C: def f(s): return 1 class D(C): pass def f(s): return 2*super.f() D.f = f print(D().f()) Should that work? I would be okay if it didn't, and if the super keyword is only allowed inside a method that is lexically inside a class. Then the second definition of f() should be a (phase 2) SyntaxError. Was it ever decided whether the implicitly bound class should be: - the class object as produced by the class statement (before applying class decorators); - whatever is returned by the last class decorator (if any); or - whatever is bound to the class name at the time the method is invoked? I've got a hunch that #1 might be more solid; #3 seems asking for trouble. There's also the issue of what to do when the method itself is decorated (the compiler can't know what the decorators mean, even for built-in decorators like classmethod). > > - "For static methods and normal functions, will be None, > > resulting in a TypeError being raised during the preamble." How do you > > know you're in this situation at run time? By the time the function > > body is entered the knowledge about whether this was a static or > > instance method is lost. > > The preamble will not technically be part of the function body - it occurs > after unpacking the parameters, but before entering the function body, and > has access to the C-level variables of the function/method object. So the > exception will be raised before entering the function body. > > The way I see it, during class construction, a C-level variable on the > method object would be bound to the (decorated?) class. This really needs to > be done as the last step in class construction if it's to bind to the > decorated class - otherwise it can be done as the methods are processed. We could make the class in question a fourth attribute of the (poorly named) "bound method" object, e.g. im_class_for_super (im_super would be confusing IMO). Since this is used both by instance methods and by the @classmethod decorator, it's just about perfect for this purpose. (I would almost propose to reuse im_self for this purpose, but that's probably asking for subtle backwards incompatibilities and not worth it.) Then when we're calling a bound method X (bound either to an instance or to a class, depending on whether it's an instance or class method), *if* the im_class_for_super is set, and *if* the function (im_func) has a "free variable" named 'super', *then* we evaluate __builtin__.__super__(X.im_class_for_super, X.im_self) and bind it to that variable. If there's no such free variable, we skip this step. This step could be inserted in call_function() in Python/ceval.c in the block starting with "if (PyMethod_check(func) && ...)". It also needs to be inserted into method_call() in Objects/classobject.c, in the toplevel "else" block. (The ceval version is a speed hack, it inlines the essence of method_call().) Now we need to modify the compiler, as follows (assume super is a keyword): - Consider three types of scopes, which may be nested: the outermost (module or exec) scope, class scope, and function scope. The latter two can be nested arbitrarily. - The super keyword is only usable in an expression (it becomes an alternative for 'atom' in the grammar). It can not be used as an assignment target (this is a phase 2 SyntaxError) nor in a nonlocal statement. - The super keyword is only allowed in a function that is contained in a class (directly or nested inside another function). It is not allowed directly in a class, nor in the outermost scope. - If a function contains a valid use of super, add a free variable named 'super' to the function's set of free variables. - If the function is nested inside another function (not in a class), add the same free variable to that outer function too, and so on, until a function is reached that is nested in a class, not in a function. - All *uses* of the super keyword are turned into references to this free variable. I think this should work; it mostly uses existing machinery; it is explainable using existing mechanisms. If a function using super is somehow called without going through the binding of super, it will just get the normal error message when super is used: NameError: free variable 'super' referenced before assignment in enclosing scope IMO that's good enough; it's pretty hard to produce such a call. > I was thinking that by binding that variable to Py_None for static methods > it would allow someone to do the following: > > def modulefunc(self): > pass > > class A(object): > def func(self): > pass > > @staticmethod > def staticfunc(): > pass > > class B(object): > func = A.func > staticfunc = A.staticfunc > outerfunc = modulefunc > > class C(object): > outerfunc = B.outerfunc > > but that's already going to cause problems when you call the methods - they > will be being called with instances of the wrong type (raising a TypeError). I don't see any references to super in that example -- what's the relevance? > So now I think both static methods and functions should just have that > variable left as NULL. Trying to get __super__(NULL) will throw a TypeError. See my proposal above. It differs slightly in that the __super__ call is made only when the class is not NULL. On the expectation that a typical function that references super uses it exactly once per call (that would be by far the most common case I expect) this is just fine. In my proposal the 'super' variable contains whatever __super__(, ) returned, rather than which you seem to be proposing here. > > - The reference implementation (by virtue of its bytecode hacking) > > only applies to CPython. (I'll have to study it in more detail later.) > > Yep, and it has quite a few limitations. I'd really like to split it out > from the PEP itself, but I'm not sure where I should host it. Submit it as a patch to SourceForge and link to it from the PEP (I did this for PEP 3119). If you still care about it -- I'm also okay with just having it in the subversion archives. > > I'll probably come up with more detailed feedback later. Keep up the > > good work!! > > Now I've got to find the time to try implementing it. Neal has said he's > willing to help, but I want to give it a go myself. Great (either way) ! PS if you like my proposal, feel free to edit it into shape for the PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From python at zesty.ca Sat May 26 01:20:07 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Fri, 25 May 2007 18:20:07 -0500 (CDT) Subject: [Python-3000] PEP 3131 normalization forms In-Reply-To: <465521B6.1050601@v.loewis.de> References: <465521B6.1050601@v.loewis.de> Message-ID: NFKC might be a better choice than NFC for normalizing identifiers. Do we really want "find()" (with the fi-ligature) and "find()" (without the fi-ligature) to be two different functions? Martin, is there a reason to prefer NFC over NFKC? -- ?!ng From rhamph at gmail.com Sat May 26 01:29:34 2007 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 25 May 2007 17:29:34 -0600 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/25/07, Jim Jewett wrote: > On 5/25/07, Adam Olsen wrote: > > On 5/25/07, Jim Jewett wrote: > > > On 5/25/07, Adam Olsen wrote: > > > > If we allowed an underscore as a mixed-script separator > > > > (allowing "def get_??(self):"), does this let us get away > > > > with otherwise banning mixed-scripts? > > ... > > > Indeed, the whole-script confusables does create significant > > holes, but I think the best solution is still to ban mixed-scripts > > and accept that it's only a "75% solution". Using an "I'm > > expecting cyrillic" flag makes it harder for those who need > > cyrillic AND still leaves them vulnerable to the same problem > > we're trying to protect ourselves from. > > hmm... I had thought they should either not include the confusable > letters, or use different fonts -- whatever they normally do. I don't understand. Are you suggesting that those typing in russian or ukrainian should switch from cyrillic to latin when typing in 'a'? Surely I misunderstand. But as for how likely accidental confusion is, to provide statistics I installed a ukrainian wordlist and grepped it for words that only contained characters resembling lowercase latin characters (in my font). Of 990736 entries, only 133 matched. Of those, only one of them looked like an english word: a lone 'i'. I'm tempted to suggest special-casing it, but if that's the worst problem in all of this I think it can wait until it's proven to be a problem. > But I suppose using an _ separator could still be a useful crutch. > Whether it is useful enough ... I'll let others chime in. Using _ as a separator is only intended to allow fixed prefixes (or suffixes) for arbitrary names[1]. I don't see how this becomes a crutch. [1] urllib2 uses this style, although it's unlikely to ever have non-ascii names. Still, I don't think we should limit the style. > > A more extreme solution would be to introduce a symbol type that > > converts that converts whole-script confusables to a canonical > > form > > The unicode consortium recommends against this. I'm not sure if it is > just a presentation issue, or concerns about compatibility; the > "confusables" lists are explicitly allowed to change. Having the equivalences change between python versions (assuming at least this aspect is hardcoded) would be quite troublesome. Perhaps even moreso than the confusion it's intended to prevent! -- Adam Olsen, aka Rhamphoryncus From greg.ewing at canterbury.ac.nz Sat May 26 01:50:07 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 26 May 2007 11:50:07 +1200 Subject: [Python-3000] PEP 3131 normalization forms In-Reply-To: References: <465521B6.1050601@v.loewis.de> Message-ID: <4657762F.8070307@canterbury.ac.nz> Ka-Ping Yee wrote: > NFKC might be a better choice than NFC for normalizing identifiers. > Do we really want "find()" (with the fi-ligature) and "find()" > (without the fi-ligature) to be two different functions?\ Do we really want to allow ligatures at all? -- Greg From python at zesty.ca Sat May 26 02:14:47 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Fri, 25 May 2007 19:14:47 -0500 (CDT) Subject: [Python-3000] PEP 3131 normalization forms In-Reply-To: <4657762F.8070307@canterbury.ac.nz> References: <465521B6.1050601@v.loewis.de> <4657762F.8070307@canterbury.ac.nz> Message-ID: On Sat, 26 May 2007, Greg Ewing wrote: > Ka-Ping Yee wrote: > > NFKC might be a better choice than NFC for normalizing identifiers. > > Do we really want "find()" (with the fi-ligature) and "find()" > > (without the fi-ligature) to be two different functions?\ > > Do we really want to allow ligatures at all? If we require identifiers in source code to be in NFKC, I believe there won't be any ligatures. The NFKC for the "fi" ligature is the two-letter sequence "fi". -- ?!ng From showell30 at yahoo.com Sat May 26 02:42:18 2007 From: showell30 at yahoo.com (Steve Howell) Date: Fri, 25 May 2007 17:42:18 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: Message-ID: <28496.42353.qm@web33503.mail.mud.yahoo.com> --- Guido van Rossum wrote: > On 5/25/07, Jim Jewett wrote: > > On 5/24/07, Guido van Rossum > wrote: > > > > > It doesn't look like any kind of global flag > passed to the interpreter > > > would scale -- once I am using a known trusted > contribution that uses > > > a different character set than mine, I would > have to change the global > > > setting to be more lenient, and the leniency > would affect all code I'm > > > using. > > > > Are you still thinking about the single on/off > switch? > > > > I agree that saying "Japanese identifiers are OK > from now on" still > > shouldn't turn on Cyrillic identifiers. I think > the current > > alternative boils down to some variant of > > > > python -idchars allowedchars.txt > > > > where allowedchars.txt would look something like > > > > > > 0780..07B1 ; Thaana > > > > or > > > > 10000..100FA ; Linear_B plus some blanks I was > too lazy to exclude > > > > (These lines are based on the unicode Scripts.txt, > and use character > > ranges instead of script names so that you can > exclude certain symbols > > if you want to.) > > I still think such a command-line switch (or > switches) is the wrong > approach. What if I have *one* module that uses > Cyrillic legitimately. > A command-line switch would enable Cyrillic in *all* > modules. > I agreed with you at first that once you allow Cyrillic code from your good, trusted buddy that codes in Cyrillic, you essentially open the door for all bad people that code in Cyrillic, so enabling/requiring a flag that trusts/distrusts Cyrillic code is basically an exercise in futility. But why couldn't there be a mechanism to accept only individual non-ascii modules as trusted modules? ____________________________________________________________________________________ Expecting? Get great news right away with email Auto-Check. Try the Yahoo! Mail Beta. http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html From showell30 at yahoo.com Sat May 26 03:01:55 2007 From: showell30 at yahoo.com (Steve Howell) Date: Fri, 25 May 2007 18:01:55 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <20070525095511.866D.JCARLSON@uci.edu> Message-ID: <236066.59081.qm@web33506.mail.mud.yahoo.com> --- Josiah Carlson wrote: > > Baptiste Carvello, in addition to Jim, Ka-Ping, > Stephen, and myself, > further discusses why ascii is the only sane default > in his most recent > 3 posts. I will add my much less venerated name to the list of people who think ascii is the sane default in any situation. I think this whole debate could be put to rest by agreeing to err on the side of ascii in 3.0 beta, and if in real world experience, that turns out to be the wrong decision, simply fix it in 3.0 production, 3.1, or 3.2. I like incrementism, despite the lofty agenda of 3.0. ____________________________________________________________________________________Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz From showell30 at yahoo.com Sat May 26 03:12:40 2007 From: showell30 at yahoo.com (Steve Howell) Date: Fri, 25 May 2007 18:12:40 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <28496.42353.qm@web33503.mail.mud.yahoo.com> Message-ID: <874523.63607.qm@web33506.mail.mud.yahoo.com> --- Steve Howell wrote: > --- Guido van Rossum wrote: > > > On 5/25/07, Jim Jewett > wrote: > > > On 5/24/07, Guido van Rossum > > wrote: > > > > > > > It doesn't look like any kind of global flag > > passed to the interpreter > > > > would scale -- once I am using a known trusted > > contribution that uses > > > > a different character set than mine, I would > > have to change the global > > > > setting to be more lenient, and the leniency > > would affect all code I'm > > > > using. > > > > > > Are you still thinking about the single on/off > > switch? > > > > > > I agree that saying "Japanese identifiers are OK > > from now on" still > > > shouldn't turn on Cyrillic identifiers. I think > > the current > > > alternative boils down to some variant of > > > > > > python -idchars allowedchars.txt > > > > > > where allowedchars.txt would look something like > > > > > > > > > 0780..07B1 ; Thaana > > > > > > or > > > > > > 10000..100FA ; Linear_B plus some blanks I was > > too lazy to exclude > > > > > > (These lines are based on the unicode > Scripts.txt, > > and use character > > > ranges instead of script names so that you can > > exclude certain symbols > > > if you want to.) > > > > I still think such a command-line switch (or > > switches) is the wrong > > approach. What if I have *one* module that uses > > Cyrillic legitimately. > > A command-line switch would enable Cyrillic in > *all* > > modules. > > > > I agreed with you at first that once you allow > Cyrillic code from your good, trusted buddy that > codes > in Cyrillic, you essentially open the door for all > bad > people that code in Cyrillic, so enabling/requiring > a > flag that trusts/distrusts Cyrillic code is > basically > an exercise in futility. > > But why couldn't there be a mechanism to accept only > individual non-ascii modules as trusted modules? > Never mind. I already know the answer to my question. The mechanism to import only "trusted modules" is the import statement itself, backed by unit tests, trust models, etc. I don't think my somewhat fallacious reasoning invalidates the argument for making Python parochial by default, though. ____________________________________________________________________________________You snooze, you lose. Get messages ASAP with AutoCheck in the all-new Yahoo! Mail Beta. http://advision.webevents.yahoo.com/mailbeta/newmail_html.html From greg.ewing at canterbury.ac.nz Sat May 26 03:15:45 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 26 May 2007 13:15:45 +1200 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: <4656446F.8030802@canterbury.ac.nz> Message-ID: <46578A41.4090201@canterbury.ac.nz> Terry Reedy wrote: > I have not seen any response to my suggestion to simplify the to-me overly > baroque semantics. Missed it? Still thinking? Or did I miss something? Sorry, I've been meaning to reply, but haven't got around to it. > Delete special casing of NotImplemented. This is the standard way for a binary operator method to indicate that it doesn't know how to handle the types it's been given. It signals to the interpreter machinery to give the other operand a chance to handle the operation. It's a complexity already present in the system for handling binary operators, not something introduced by this proposal. > Delete NeedOtherOperand (where would it even live?) The same place as NotImplemented, Ellipsis, etc live already. > The current spelling > is True for and and False for or, as with standard semantics. No, that's not the current spelling. The current 'and' and 'or' know nothing about True and False, only whether their operands are true or false (with a small 't'). It could possibly be *used* as the spelling for this purpose, but my feeling is that it would muddy the distinction between standard boolean semantics and whatever new semantics the overloaded methods are implementing -- which is supposed to be completely independent of the standard semantics. > Delete the reverse methods. They are only needed for mixed-type > operations, like scaler*matrix. But such seems senseless here. In any > case, they are not needed for any of your motivating applications, which > would define both methods without mixing. I don't agree. For example, if you're implementing operations on matrices of booleans, it seems reasonable that things like 'b and m' or 'm and b', where b is a standard boolean, should broadcast the scalar over the matrix, as with all the other binary operations. To make that work at the Python level, you need the reversed methods. As another example, in an SQL expression builder, it doesn't seem unreasonable that mixing ordinary boolean values with SQL boolean expressions should give the expected results. Besides, if the reversed methods weren't there, it would make these operator a special case with respect to all the others, for no apparently good reason. So while it would be a local simplification, I don't think it would simplify things overall. > Delete the 'As a special case' sentence. That would make the spec shorter, but would make the facility more complicated to *use* in many cases. So again, I don't think this would be an overall simplification. > Type Slots: someone else can decide if a new flag and 5 new slots are a > significant price. I don't think anyone is worried about the size of type objects -- they're not something you normally create in large quantities. -- Greg From greg.ewing at canterbury.ac.nz Sat May 26 03:41:29 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 26 May 2007 13:41:29 +1200 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: References: Message-ID: <46579049.5060206@canterbury.ac.nz> Jim Jewett wrote: > It currently says that __not__ can return NotImplemented, which falls > back to the current semantics. I'm not sure why I put that there. As you observe, it's not necessary, since you can always get the default semantics simply by not defining the method. An experiment suggests that the existing unary operator methods don't special-case NotImplemented, so I'll remove that part. > It does not yet say what will happen for objects that return something > else outside of {True, False}, There's nothing to say -- whatever you return is the result. That's the whole point of making it overloadable. > Is that OK, because "not not X" should now be spelled "bool(x)", and > you haven't allowed the overriding of __bool__? Yes, I would say that 'not not x' should indeed be spelled bool(x), if that's what you intend it to mean. Whether __bool__ should be overloadable is outside the scope of this PEP. But if it is overloadable, I would recommend that it not be allowed to return anything other than a boolean. -- Greg From greg.ewing at canterbury.ac.nz Sat May 26 03:46:13 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 26 May 2007 13:46:13 +1200 Subject: [Python-3000] [Python-Dev] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: <20070525155302.917F23A4061@sparrow.telecommunity.com> References: <4656446F.8030802@canterbury.ac.nz> <20070525155302.917F23A4061@sparrow.telecommunity.com> Message-ID: <46579165.1080408@canterbury.ac.nz> Phillip J. Eby wrote: > Actually, I think that most of the use cases for this PEP would be > better served by being able to "quote" code, i.e. to create AST > objects directly from Python syntax. That's been suggested before, but hasn't received a favourable response. One problem is that it would force all alternative implementations to be able to produce an AST with the same structure as CPython's. Also it could be considered dangerously close to "programmable syntax". -- Greg From nnorwitz at gmail.com Sat May 26 04:29:19 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Fri, 25 May 2007 19:29:19 -0700 Subject: [Python-3000] Wither PEP 335 (Overloadable Boolean Operators)? In-Reply-To: <46579049.5060206@canterbury.ac.nz> References: <46579049.5060206@canterbury.ac.nz> Message-ID: On 5/25/07, Greg Ewing wrote: > > > Is that OK, because "not not X" should now be spelled "bool(x)", and > > you haven't allowed the overriding of __bool__? > > Yes, I would say that 'not not x' should indeed be spelled > bool(x), if that's what you intend it to mean. > > Whether __bool__ should be overloadable is outside the scope > of this PEP. But if it is overloadable, I would recommend > that it not be allowed to return anything other than a boolean. There is already a __bool__ method in 3k. It's the old __nonzero__ method. >>> 5 .__bool__() True >>> 0 .__bool__() False >>> class F: ... def __bool__(self): return 5 >>> if F(): print('is') ... Traceback (most recent call last): File "", line 1, in TypeError: __bool__ should return bool, returned int n From bwinton at latte.ca Sat May 26 05:00:38 2007 From: bwinton at latte.ca (Blake Winton) Date: Fri, 25 May 2007 23:00:38 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <20070524213605.864B.JCARLSON@uci.edu> <20070525084117.865D.JCARLSON@uci.edu> Message-ID: <4657A2D6.3050809@latte.ca> Ka-Ping Yee wrote: > On Fri, 25 May 2007, Josiah Carlson wrote: >> Apples and oranges to be sure, but there are no other statistics that >> anyone else is able to offer about use of non-ascii identifiers in Java, >> Javascript, C#, etc. > Let's see what we can find. I made several attempts to search for > non-ASCII identifiers using google.com/codesearch and here's what I got. I think you've got a selection bias here, since google isn't likely to index code not intended for the whole world, and thus the code you'll be searching through is more likely to be in english than code in general. Perhaps searching the entire web for "class ", or " (" or " =" would give more accurate results, if such a thing is even possible. Later, Blake. From bwinton at latte.ca Sat May 26 05:45:22 2007 From: bwinton at latte.ca (Blake Winton) Date: Fri, 25 May 2007 23:45:22 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <20070524213605.864B.JCARLSON@uci.edu> <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> <19dd68ba0705250854o40a1025cse3d5f2c38cd76785@mail.gmail.com> <19dd68ba0705250855r6d2676c6r5e9cb7a49b95b6ac@mail.gmail.com> Message-ID: <4657AD52.5020101@latte.ca> Jim Jewett wrote: >>> If you didn't realize it was using non-ASCII (or even that it >>> could), and the author didn't warn you -- then that is an1 >>> appropriate time for the interpreter to warn you that things aren't >>> as you expect. >> I fail to see your point. Why should the interpreter warn you? > Arbitrary Unicode identifier opens up the possibility of code that > *looks* like ASCII, but isn't -- so I don't even realize that I missed > something. You already have that problem. Right now. And you've had it for at least a year (assuming you installed 2.4.3 when it came out). All screenshots taken on Python 2.4.3, Mac OSX 10.4 Intel. http://bwinton.latte.ca/temp/Python/File.png http://bwinton.latte.ca/temp/Python/Run.png http://bwinton.latte.ca/temp/Python/foo.py So, what are you doing to mitigate this risk now, and why not do the same thing when identifiers are allowed to be arbitrary Unicode? Later, Blake. From ocean at m2.ccsnet.ne.jp Sat May 26 06:30:21 2007 From: ocean at m2.ccsnet.ne.jp (ocean) Date: Sat, 26 May 2007 13:30:21 +0900 Subject: [Python-3000] python/trunk/Lib/test/test_urllib.py (for ftpwrapper) Message-ID: <001101c79f4e$92d35f60$0300a8c0@whiterabc2znlh> http://mail.python.org/pipermail/python-checkins/2007-May/060507.html Hello. I'm using Windows2000, I tried some investigation for test_ftpwrapper. After I did this change, most errors were gone. Index: Lib/urllib.py =================================================================== --- Lib/urllib.py (revision 55584) +++ Lib/urllib.py (working copy) @@ -833,7 +833,7 @@ self.busy = 0 self.ftp = ftplib.FTP() self.ftp.connect(self.host, self.port, self.timeout) - self.ftp.login(self.user, self.passwd) +# self.ftp.login(self.user, self.passwd) for dir in self.dirs: self.ftp.cwd(dir) I don't know, but probably 'login' on Win2000 is problamatic. Remaining error is: File "e:\python-dev\trunk\lib\threading.py", line 460, in __bootstrap self.run() File "e:\python-dev\trunk\lib\threading.py", line 440, in run self.__target(*self.__args, **self.__kwargs) File "test_urllib.py", line 565, in server conn.recv(13) error: (10035, 'The socket operation could not complete without blocking') And after commented out conn.recv block in test_urllib.py, test passed fine. def server(evt): serv = socket.socket(socket.AF_INET, socket.SOCK_STREAM) serv.settimeout(3) serv.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) serv.bind(("", 9093)) serv.listen(5) try: conn, addr = serv.accept() conn.send("1 Hola mundo\n") """ cantdata = 0 while cantdata < 13: data = conn.recv(13-cantdata) cantdata += len(data) time.sleep(.3) """ conn.send("2 No more lines\n") conn.close() except socket.timeout: pass finally: serv.close() evt.set() From stephen at xemacs.org Sat May 26 07:05:36 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 May 2007 14:05:36 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <19dd68ba0705120827s5415c4dcx12e5862f32cc3e06@mail.gmail.com> <4646A3CA.40705@acm.org> <4646FCAE.7090804@v.loewis.de> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <877iqy5v2y.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87abvs41jz.fsf@uwakimon.sk.tsukuba.ac.jp> Thank you for the apology. I have cooled off, and I hope you won't hold the "take offense" against me. I was hurt, for sure, but you're right, that's a legitimate reading in colloquial English. Ka-Ping Yee writes: > That just means, if we're going to provide this feature, we shouldn't > force subtle wrongness upon people by making it the default. I agree wholeheartedly! But AFAIK this is the first time you have explicitly limited yourself in principle to discussion of the default. Up to now you've opposed the whole idea. > The PEP says that Python will *convert* the identifiers into NFC. > I'd rather there not be lots of different ways to write the same > identifier (TOOWTDI), so this particular recommendation is that > identifiers in source code have to already be normalized. A Unicode conforming process may not distinguish between different representations of a given character. Ie, the NFC conversion is an internal optimization. The characters are the same. I think Unicode conformance is close enough to TOOWDTI, and far more important than the remaining difference. YMMV. Pragmatically, users are likely not to know how to do it. I do it with an explicit call to an external library provided by Mac OS X; I don't know how to do it (ie, what the (de)composition is, and often even how to input the resulting characters) without access to the library canonicalization API. My input methods do not provide such a facility. (And Unicode says that they may refuse to do so.) Finally, this would also be inconsistent with the definition of Python implicit in PEP 263, which clearly envisions a Python program as a sequence of abstract characters which may have an arbitrary ASCII-compatible encoding on disk. From stephen at xemacs.org Sat May 26 08:37:08 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 May 2007 15:37:08 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <20070525095511.866D.JCARLSON@uci.edu> References: <20070525091105.8663.JCARLSON@uci.edu> <19dd68ba0705250945j3dadcefcu8db91b3d2c055fdf@mail.gmail.com> <20070525095511.866D.JCARLSON@uci.edu> Message-ID: <878xbc3xbf.fsf@uwakimon.sk.tsukuba.ac.jp> Josiah Carlson writes: > It does, but it also refuses the temptation to guess that *everyone* > wants to use unicode identifiers by default. Why? As Stephen Turnbull > has already stated, the majority of users will have *no use* and *no > exposure* to unicode identifiers. I'm afraid I conflated two issues in that post. I'm sorry for the confusion. My first claim is that editor (not Python!) users indeed will be overwhelmingly monoscript for the foreseeable future. I'd bet serious money on that (as long as somebody else pays for the survey to make the judgment :-). My second claim is that where non-ASCII identifiers are *already* available, their use is extremely restricted, and the overwhelming majority of programmers never encounter them. I predict that once PEP 3131 is implemented, their overall usage in Python programs will increase very slowly for a few years. However, there will be pockets of fast diffusion (CP4E in particular, including programming classes for history majors at university and the like). By the way, this is an example that shows that the recent injection of the word "parochial" is truly pernicious, because it's attached to the wrong set of arguments. Please note, it is those pockets of Unicode adoption that are truly parochial, not the ASCII advocates! Those pockets can be early and deep adopters precisely because they are small, homogeneous groups, unconcerned with the world outside. ASCII advocates are obviously self-interested ("IAGNI, so *you* can't have it, it would cost me extra effort"), but they are *not* parochial: they *know* they're going to exchange code with other cultures, they *welcome* that exchange, and *they do not want it hindered for "frivolous" reasons*. Advocates of Unicode want it for themselves and their buddies, and of course are happy to have it used by other groups---used *independently* by *equally parochial* groups. True, "frivolous" is a parochial evaluation of the cultural exchange that use of Unicode identifiers can foster, but that notion of "parochial" is on a different level. IMHO that "cultural exchange" level is highly relevant to the decision to implement Unicode identifiers in some way, but it's the "code exchange" level that is most relevant to the pace of introduction. And that has to consider the balance between faster growth within Unicode-using groups, versus the facilitation of opportunistic[1] exchange among groups using the (admittedly imperfect) lingua franca of ASCII. Footnotes: [1] Ie, when you look at someone's app and go "I wonder how she does that? Can I use her code in my app?" Obviously in a formal exchange, the identifier constituent set can and should be negotiated. From stephen at xemacs.org Sat May 26 08:53:47 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 May 2007 15:53:47 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <465667AE.2090000@v.loewis.de> <20070524215742.864E.JCARLSON@uci.edu> <740c3aec0705250255k642d6637re46e3929212f1369@mail.gmail.com> Message-ID: <877iqw3wjo.fsf@uwakimon.sk.tsukuba.ac.jp> Jim Jewett writes: > On 5/25/07, BJ?rn Lindqvist wrote: > > If Python required a switch for such a program to run, then this > > feature would be totally wasted on them. They might use an IDE, > > program in notepad.exe and dragging the file to the python.exe icon or > > not even know about cmd.exe or what a command line switch is. An error > > message, even an informal one, isn't easy to understand if you don't > > know English. This can be handled with wrappers, at install time. Ugly, but workable. Jim's idea is very suggestive, though: > How about a default file, such as > > "on launch, python looks for pyidchar.txt ... if you want to override > this default file do XYZ" This still doesn't help to address the "fine-grained" (per-module or per-file) control issue, right? Unless you complexified the syntax. You could allow includes (from a site library of character set definitions, not arbitrary files), inline table definitions, and a file or module to table mapping. Since this would a under control of the site (distriubtions could supply examples, but not install them where Python would pick them up), maybe such complexity would be OK? I believe most people's file would be [DEFAULT] 000000-1FFFFF # intersection of the full Unicode range and PEP # 3131-permitted characters (where DEFAULT is a special table used by default for files not mapped to another table). How about per-user overrides? From stephen at xemacs.org Sat May 26 09:42:57 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 May 2007 16:42:57 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> Jim Jewett writes: > > How about a regexp character class as starting point? > > I'm not sure I understand. Do you mean that part of localization > should be defining what certain regular expressions should match? No, I meant simply a list of character ranges, as characters. The definition of "safe ASCII" would be something like r"\t\r\n -~" Your table format is better. If people want to put the actual characters in comments (maybe in source files to be preprocessed before installation), let them. > So long as we allow tailoring, I think the maximal set should be > generous -- and I don't see any reason to pre-exclude anything outside > ASCII. Cf characters? Are we admitting "stupid bidi tricks", too? But I'll tell you what my reason is: we want to be in a position to avoid prohibiting previously acceptable characters wherever possible. > There are people who like to use names like "Program Files" or > "Summary of Results.Apr-3-2007 version 2.xls"; I expect the same will > be true of identifiers. So long as the punctuation is not ASCII, we > might as well let them. Why not let them use ASCII punctuation, as long as it's not Python syntax? Ie, for one thing, we might want to do something with that punctuation some day. For example, I could imagine using guillemots to denote rawstrings or to substitute for triple quotes. Local parsing (as done by program editors) would be easier with directed quotes. Etc. For reasons of visual distinctiveness, we might choose to use Chinese or Arabic versions. > The other committees say to exclude certain scripts, like Linear B and > Ogham. And not to allow mixed scripts, at least if they're > confusable. But I really don't want to explain why someone using > Cyrillic can't use certain (apparently to him) randomly determined > identifiers just because it could be confused with ASCII (or > Armenian). -1 on restrictions according to confusability or the block. That's a matter for personal judgement, and there are cheap technical solutions for those who want to use confusable Cyrillic or Linear B and still avoid confusion. I think those restrictions are an idea that must be available (perhaps as a table we distribute), but I think they'll turn out to suck pretty badly. > If unicode comes out with a new revision, the new characters should > probably be allowed; I don't want a situation where users of Cham or > Lepcha[1] are told they have to wait another year because their > scripts weren't formally adopted into unicode until after python 3.4.0 > was already released. Tough call. I'd say, let's cross that bridge when we come to it. In any case there will have to be some mechanism to access a Unicode database at either build time or run time. Let them munge that database if they're in a hurry. Maybe the way to handle this is to allow private-space characters in identifiers as an option. That would be doable with your well-known file scheme. But it's very dangerous across modules. By the way, this is what the Japanese call the "gaiji" ("outside character") problem. It's a very tough nut to crack; the Japanese never did. From timothy.c.delaney at gmail.com Sat May 26 10:13:51 2007 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Sat, 26 May 2007 18:13:51 +1000 Subject: [Python-3000] [Python-Dev] PEP 367: New Super References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <017d01c79e98$c6b84090$0201a8c0@mshome.net> <001d01c79f15$f0afa140$0201a8c0@mshome.net> Message-ID: <002d01c79f6d$ce090de0$0201a8c0@mshome.net> Guido van Rossum wrote: >>> - Why not make super a keyword, instead of just prohibiting >>> assignment to it? (I'm planning to do the same with None BTW in >>> Py3k -- I find the "it's a name but you can't assign to it" a >>> rather silly business and hardly "the simplest solution".) >> >> That's currently an open issue - I'm happy to make it a keyword - in >> which case I think the title should be changed to "super as a >> keyword" or something like that. > > As it was before. :-) > > What's the argument against? I don't see any really, especially if None is to become a true keyword. But some people have raised objections. >> Th preamble will only be added to functions/methods that cause the >> 'super' cell to exist i.e. for CPython have 'super' in co.cellvars. >> Functions that just have 'super' in co.freevars wouldn't have the >> preamble. > > I think it's still too vague. For example: > > class C: > def f(s): > return 1 > class D(C): > pass > def f(s): > return 2*super.f() > D.f = f > print(D().f()) > > Should that work? I would be okay if it didn't, and if the super > keyword is only allowed inside a method that is lexically inside a > class. Then the second definition of f() should be a (phase 2) > SyntaxError. That would simplify things. I'll update the PEP. > Was it ever decided whether the implicitly bound class should be: > > - the class object as produced by the class statement (before applying > class decorators); > - whatever is returned by the last class decorator (if any); or > - whatever is bound to the class name at the time the method is > invoked? > I've got a hunch that #1 might be more solid; #3 seems asking for > trouble. I think #3 is definitely the wrong thing to do, but there have been arguments put forwards for both #1 and #2. I think I'll put it as an open issue for now. > There's also the issue of what to do when the method itself is > decorated (the compiler can't know what the decorators mean, even for > built-in decorators like classmethod). I think that may be a different issue. If you do something like: class A: @decorator def func(self): pass class B(A): @decorator def func(self): super.func() then `super.func()` will call whatever `super(B, self).func()` would now, which (I think) would result in calling the decorated function. However, I think the staticmethod decorator would need to be able to modify the class instance that's held by the method. Or see my proposal below ... > We could make the class in question a fourth attribute of the (poorly > named) "bound method" object, e.g. im_class_for_super (im_super would > be confusing IMO). Since this is used both by instance methods and by > the @classmethod decorator, it's just about perfect for this purpose. > (I would almost propose to reuse im_self for this purpose, but that's > probably asking for subtle backwards incompatibilities and not worth > it.) I'm actually thinking instead that an unbound method should reference an unbound super instance for the appropriate class - which we could then call im_super. For a bound instance or class method, im_super would return the appropriate bound super instance. In practice, it would work like your autosuper recipe using __super. e.g. class A: def func(self): pass >>> print A.func.im_super , NULL> >>> print A().func.im_super , > > See my proposal above. It differs slightly in that the __super__ call > is made only when the class is not NULL. On the expectation that a > typical function that references super uses it exactly once per call > (that would be by far the most common case I expect) this is just > fine. In my proposal the 'super' variable contains whatever > __super__(, ) returned, rather than which you > seem to be proposing here. Think I must have been explaining poorly - if you look at the reference implementation in the PEP, you'll see that that's exactly what's held in the 'super' free variable. I think your proposal is basically what I was trying to convey - I'll look at rewording the PEP so it's less ambiguous. But I'd like your thoughts on the above proposal to keep a reference to the actual super object rather than the class. Cheers, Tim Delaney From python at zesty.ca Sat May 26 12:01:32 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Sat, 26 May 2007 05:01:32 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4657A2D6.3050809@latte.ca> References: <20070524213605.864B.JCARLSON@uci.edu> <20070525084117.865D.JCARLSON@uci.edu> <4657A2D6.3050809@latte.ca> Message-ID: On Fri, 25 May 2007, Blake Winton wrote: > Ka-Ping Yee wrote: > > Let's see what we can find. I made several attempts to search for > > non-ASCII identifiers using google.com/codesearch and here's what I got. > > I think you've got a selection bias here, since google isn't likely to > index code not intended for the whole world, and thus the code you'll be > searching through is more likely to be in english than code in general. Indeed. I couldn't think of a better way to do a search, but if you come up with any better methods, go for it and let us know what you find. -- ?!ng From python at zesty.ca Sat May 26 12:33:23 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Sat, 26 May 2007 05:33:23 -0500 (CDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> <20070524082737.862E.JCARLSON@uci.edu> <4655DD4E.3050809@v.loewis.de> <4656129D.5000406@v.loewis.de> Message-ID: Ka-Ping Yee wrote: > Alas, the coding directive is not good enough. Have a look at this: > > http://zesty.ca/python/tricky.png > > That's an image of a text editor containing some Python code. Can you > tell whether running it (post-PEP-3131) will delete your .bashrc file? Martin v. L?wis wrote: > I would think that it doesn't (i.e. allowed should stay at 0). > > Why does os.remove get invoked? Mike Klaas wrote: > Perhaps a letter in the encoding declaration is non-ascii, nullifying > the encoding enforcement and allowing a cyrillic 'a' in allowed = 0? You got it. See the actual source file at http://zesty.ca/python/tricky.py There are three things going on here: 1. All three occurrences of "allowed" look the same. And it seems they are truly the same, because the coding declaration on line 2 says the file is ASCII. But in fact, they aren't the same -- one of them contains a Cyrillic "a", which changes the meaning of the program. 2. But how is that possible when the coding declaration says the file is ASCII? If you believe it, then you also expect the coding declaration itself to be ASCII, i.e., a real coding declaration. But it isn't -- the word "coding" contains a Cyrillic "c". 3. Then why doesn't Python complain about this non-ASCII character on line 2 of the file, since ASCII is supposed to be the default encoding? Because there is a UTF-8 BOM at the beginning of the file. PEP 263 tries to prevent confusion by making Python complain if the coding declaration conflicts with the already-set UTF-8 encoding. But even though line 2 looks like a coding declaration, Python doesn't notice it, so you get no warning. The conclusion is that one cannot rely on the coding declaration to know what the encoding is, because one cannot know what the coding declaration says. We would be able to rely on it, if only it were encoded in ASCII. But the enabling of UTF-8 by a BOM at the beginning of the file is an invisible override. This invisible override is the source of the danger. If we want to be able to read the coding declaration with any confidence, we should get rid of the invisible override. -- ?!ng From python at zesty.ca Sat May 26 12:37:46 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Sat, 26 May 2007 05:37:46 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4657AD52.5020101@latte.ca> References: <20070524213605.864B.JCARLSON@uci.edu> <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> <19dd68ba0705250854o40a1025cse3d5f2c38cd76785@mail.gmail.com> <19dd68ba0705250855r6d2676c6r5e9cb7a49b95b6ac@mail.gmail.com> <4657AD52.5020101@latte.ca> Message-ID: On Fri, 25 May 2007, Blake Winton wrote: > Jim Jewett wrote: > > Arbitrary Unicode identifier opens up the possibility of code that > > *looks* like ASCII, but isn't -- so I don't even realize that I missed > > something. > > You already have that problem. Right now. And you've had it for at > least a year (assuming you installed 2.4.3 when it came out). > > All screenshots taken on Python 2.4.3, Mac OSX 10.4 Intel. > > http://bwinton.latte.ca/temp/Python/File.png > http://bwinton.latte.ca/temp/Python/Run.png > http://bwinton.latte.ca/temp/Python/foo.py Yes -- you have demonstrated exactly why the default encoding for Python files should be 7-bit ASCII, and why a coding declaration should be required to switch to other encodings, to let the reader know that the file might not contain what it appears to contain. Your file, like tricky.py, relies on the invisible enabling of UTF-8 by a UTF-8-encoded BOM at the beginning of the file. Switching to UTF-8 invisibly (or by default) is dangerous; enabling non-ASCII identifiers by default augments this problem to a whole new level. Neither should be the default. -- ?!ng From bwinton at latte.ca Sat May 26 15:31:42 2007 From: bwinton at latte.ca (Blake Winton) Date: Sat, 26 May 2007 09:31:42 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <20070524213605.864B.JCARLSON@uci.edu> <20070525084117.865D.JCARLSON@uci.edu> <4657A2D6.3050809@latte.ca> Message-ID: <465836BE.8080900@latte.ca> Ka-Ping Yee wrote: > On Fri, 25 May 2007, Blake Winton wrote: >> Ka-Ping Yee wrote: >>> Let's see what we can find. I made several attempts to search for >>> non-ASCII identifiers using google.com/codesearch and here's what I got. >> I think you've got a selection bias here, since google isn't likely to >> index code not intended for the whole world, and thus the code you'll be >> searching through is more likely to be in english than code in general. > > Indeed. I couldn't think of a better way to do a search, but if you > come up with any better methods, go for it and let us know what you > find. That was what my second [snipped] paragraph was about. If you could find tutorials or sample code in other languages, that might be less biased. Or maybe more biased in the other direction. On the other hand, I suspect you might have to work at Google to be able to run those sorts of queries. It's a hard problem, and while I applaud your effort, I just wanted to make sure that people knew that it wasn't necessarily representative of the real world. Later, Blake. From guido at python.org Sat May 26 16:08:47 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 26 May 2007 07:08:47 -0700 Subject: [Python-3000] [Python-Dev] PEP 367: New Super In-Reply-To: <002d01c79f6d$ce090de0$0201a8c0@mshome.net> References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <017d01c79e98$c6b84090$0201a8c0@mshome.net> <001d01c79f15$f0afa140$0201a8c0@mshome.net> <002d01c79f6d$ce090de0$0201a8c0@mshome.net> Message-ID: Quick, since I'm about to hop on a plane: Thinking about it again, storing the super instance in the bound method object is fine, as long as you only do it when the bound function needs it. Using an unbound super object in an unbound method is also fine. --Guido On 5/26/07, Tim Delaney wrote: > Guido van Rossum wrote: > > >>> - Why not make super a keyword, instead of just prohibiting > >>> assignment to it? (I'm planning to do the same with None BTW in > >>> Py3k -- I find the "it's a name but you can't assign to it" a > >>> rather silly business and hardly "the simplest solution".) > >> > >> That's currently an open issue - I'm happy to make it a keyword - in > >> which case I think the title should be changed to "super as a > >> keyword" or something like that. > > > > As it was before. :-) > > > > What's the argument against? > > I don't see any really, especially if None is to become a true keyword. But > some people have raised objections. > > >> Th preamble will only be added to functions/methods that cause the > >> 'super' cell to exist i.e. for CPython have 'super' in co.cellvars. > >> Functions that just have 'super' in co.freevars wouldn't have the > >> preamble. > > > > I think it's still too vague. For example: > > > > class C: > > def f(s): > > return 1 > > class D(C): > > pass > > def f(s): > > return 2*super.f() > > D.f = f > > print(D().f()) > > > > Should that work? I would be okay if it didn't, and if the super > > keyword is only allowed inside a method that is lexically inside a > > class. Then the second definition of f() should be a (phase 2) > > SyntaxError. > > That would simplify things. I'll update the PEP. > > > Was it ever decided whether the implicitly bound class should be: > > > > - the class object as produced by the class statement (before applying > > class decorators); > > - whatever is returned by the last class decorator (if any); or > > - whatever is bound to the class name at the time the method is > > invoked? > > I've got a hunch that #1 might be more solid; #3 seems asking for > > trouble. > > I think #3 is definitely the wrong thing to do, but there have been > arguments put forwards for both #1 and #2. > > I think I'll put it as an open issue for now. > > > There's also the issue of what to do when the method itself is > > decorated (the compiler can't know what the decorators mean, even for > > built-in decorators like classmethod). > > I think that may be a different issue. If you do something like: > > class A: > @decorator > def func(self): > pass > > class B(A): > @decorator > def func(self): > super.func() > > then `super.func()` will call whatever `super(B, self).func()` would now, > which (I think) would result in calling the decorated function. > > However, I think the staticmethod decorator would need to be able to modify > the class instance that's held by the method. Or see my proposal below ... > > > We could make the class in question a fourth attribute of the (poorly > > named) "bound method" object, e.g. im_class_for_super (im_super would > > be confusing IMO). Since this is used both by instance methods and by > > the @classmethod decorator, it's just about perfect for this purpose. > > (I would almost propose to reuse im_self for this purpose, but that's > > probably asking for subtle backwards incompatibilities and not worth > > it.) > > I'm actually thinking instead that an unbound method should reference an > unbound super instance for the appropriate class - which we could then call > im_super. > > For a bound instance or class method, im_super would return the appropriate > bound super instance. In practice, it would work like your autosuper recipe > using __super. > > e.g. > > class A: > def func(self): > pass > > >>> print A.func.im_super > , NULL> > > >>> print A().func.im_super > , > > > > See my proposal above. It differs slightly in that the __super__ call > > is made only when the class is not NULL. On the expectation that a > > typical function that references super uses it exactly once per call > > (that would be by far the most common case I expect) this is just > > fine. In my proposal the 'super' variable contains whatever > > __super__(, ) returned, rather than which you > > seem to be proposing here. > > Think I must have been explaining poorly - if you look at the reference > implementation in the PEP, you'll see that that's exactly what's held in the > 'super' free variable. > > I think your proposal is basically what I was trying to convey - I'll look > at rewording the PEP so it's less ambiguous. But I'd like your thoughts on > the above proposal to keep a reference to the actual super object rather > than the class. > > Cheers, > > Tim Delaney > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From showell30 at yahoo.com Sat May 26 16:32:56 2007 From: showell30 at yahoo.com (Steve Howell) Date: Sat, 26 May 2007 07:32:56 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <878xbc3xbf.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <19685.97380.qm@web33511.mail.mud.yahoo.com> --- "Stephen J. Turnbull" wrote: > > By the way, this is an example that shows that the > recent injection of > the word "parochial" is truly pernicious, because > it's attached to the > wrong set of arguments. > Sorry. I'm one of the folks who has propagated that term, and I didn't mean for the use of the term to have any pernicious side effect. I have the used word in a context that basically has me labelling myself as "parochial," so I obviously I don't the word to carry any baggage. > Please note, it is those pockets of Unicode adoption > that are truly > parochial, not the ASCII advocates! Those pockets > can be early and > deep adopters precisely because they are small, > homogeneous groups, > unconcerned with the world outside. That's how I see it too. And again, I don't put any baggage with the term "parochial." I accept, and embrace, the possibility that you could have thriving small communities of Python somewhere on the other side of the globe from me, and even though they're writing code with identifiers that I can't read, they may indirectly benefit me to the extent that they eventually contribute back to the community. Or maybe they never benefit me at all, but the world is a better place. > ASCII advocates > are obviously > self-interested ("IAGNI, so *you* can't have it, it > would cost me > extra effort"), but they are *not* parochial: they > *know* they're > going to exchange code with other cultures, they > *welcome* that > exchange, and *they do not want it hindered for > "frivolous" reasons*. > That describes me perfectly. I am self-interested to the extent that my employers just pay me to write working Python code, so I want the simplicity of ASCII only. My whole team is parochial in regards to the content of the code itself, even though culturally we are very diverse (American-born programmers are the minority). In the open source world, I have in fact exchanged code with other cultures, I have welcomed the exchange, and I wouldn't want it hindered for frivolous reasons. > [...] > True, "frivolous" is a parochial evaluation of the > cultural exchange > that use of Unicode identifiers can foster, but that > notion of > "parochial" is on a different level. IMHO that > "cultural exchange" > level is highly relevant to the decision to > implement Unicode > identifiers in some way, but it's the "code > exchange" level that is > most relevant to the pace of introduction. Well said. > And that > has to consider > the balance between faster growth within > Unicode-using groups, versus > the facilitation of opportunistic[1] exchange among > groups using the > (admittedly imperfect) lingua franca of ASCII. > > Yep. ____________________________________________________________________________________Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when. http://tv.yahoo.com/collections/222 From murman at gmail.com Sat May 26 17:21:04 2007 From: murman at gmail.com (Michael Urman) Date: Sat, 26 May 2007 10:21:04 -0500 Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> <20070524082737.862E.JCARLSON@uci.edu> <4655DD4E.3050809@v.loewis.de> <4656129D.5000406@v.loewis.de> Message-ID: On 5/26/07, Ka-Ping Yee wrote: > But the enabling of UTF-8 by a BOM at the > beginning of the file is an invisible override. This invisible > override is the source of the danger. If we want to be able to > read the coding declaration with any confidence, we should get rid > of the invisible override. Do we need to reconsider PEP 3120 "Using UTF-8 as the default source encoding"? I don't see much difference between not knowing on visual inspection whether: allowed is allowed or "allowed" == "allowed" I hope that's not your stance, because I still don't expect either to cause problems in the real world. Of course since it's currently not possible, it's hard to go trolling for existing use cases of confusing identifiers in python code. -- Michael Urman From jimjjewett at gmail.com Sat May 26 17:47:47 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 26 May 2007 11:47:47 -0400 Subject: [Python-3000] [Python-Dev] PEP 367: New Super In-Reply-To: References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <017d01c79e98$c6b84090$0201a8c0@mshome.net> <001d01c79f15$f0afa140$0201a8c0@mshome.net> Message-ID: On 5/25/07, Guido van Rossum wrote: > We could make the class in question a fourth attribute of the (poorly > named) "bound method" object, e.g. im_class_for_super (im_super would > be confusing IMO). In the past, you have referred to this as the static class. I think it has other uses as well, such as a class-wide registry (whose location shouldn't be redirected without overriding the whole method). I realize this is the rejected __this_class__ proposal, but I can't help feeling that if we're going to create the magic attribute anyhow, it makes sense to have it be generally usable, instead of only as a token to create a super. > In my proposal the 'super' variable contains whatever > __super__(, ) returned, rather than which you > seem to be proposing here. That's fine, but the still has to be stored with the method to generate that super -- so why not expose it too? -jJ From jimjjewett at gmail.com Sat May 26 18:00:02 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 26 May 2007 12:00:02 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <4657AD52.5020101@latte.ca> References: <20070524213605.864B.JCARLSON@uci.edu> <19dd68ba0705242231j4f391f00n79112a01c0f339bc@mail.gmail.com> <19dd68ba0705250854o40a1025cse3d5f2c38cd76785@mail.gmail.com> <19dd68ba0705250855r6d2676c6r5e9cb7a49b95b6ac@mail.gmail.com> <4657AD52.5020101@latte.ca> Message-ID: On 5/25/07, Blake Winton wrote: > Jim Jewett wrote: > > Arbitrary Unicode identifier opens up the possibility of code that > > *looks* like ASCII, but isn't -- so I don't even realize that I missed > > something. > You already have that problem. > All screenshots taken on Python 2.4.3, Mac OSX 10.4 Intel. > http://bwinton.latte.ca/temp/Python/File.png > http://bwinton.latte.ca/temp/Python/Run.png > http://bwinton.latte.ca/temp/Python/foo.py > So, what are you doing to mitigate this risk now, and why not do the > same thing when identifiers are allowed to be arbitrary Unicode? Looking at foo.py, I didn't even realize at first that it was supposed to be a lookaline for triple-quotes; in my font, it is different enough to draw the eye and tell me that something is wrong. I don't like counting on that, but it does work -- for ASCII. It stops working with unicode, because the glyphs are even closer (or identical). This is partly a historical accident -- ASCII has been used long enough that there are widespread fonts (including the most common monospaced fonts) which make the distinctions fairly clear, and I'm already trained to look for the edge cases. Neither of these safeguards is true for unicode, nor will they become true in the forseeable future. Given the sheer size of unicode, these safeguards may never become available in the general case -- but we already have them for ASCII. -jJ From ncoghlan at gmail.com Sat May 26 18:31:42 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 May 2007 02:31:42 +1000 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <19dd68ba0705250910o5b56b4f9i9fccd450e37f48fe@mail.gmail.com> References: <465615C9.4080505@v.loewis.de> <320102.38046.qm@web33515.mail.mud.yahoo.com> <19dd68ba0705241805y52ba93fdt284a2c696b004989@mail.gmail.com> <87odk93w09.fsf@uwakimon.sk.tsukuba.ac.jp> <19dd68ba0705250641j348a42adu974fe4969897761e@mail.gmail.com> <19dd68ba0705250653v2c2a8188jac8c4ccc722fb747@mail.gmail.com> <87irag51in.fsf@uwakimon.sk.tsukuba.ac.jp> <19dd68ba0705250910o5b56b4f9i9fccd450e37f48fe@mail.gmail.com> Message-ID: <465860EE.8050005@gmail.com> Guillaume Proux wrote: > On 5/26/07, Stephen J. Turnbull wrote: >> For the medium term, there are ways to pass command line arguments to >> programs invoked by GUI. They're more or less ugly, but your daughter >> will never see them, only the pretty icons. > > Is there right now in Windows? There is none that I know today at > least. All I know is that specific extensions are called automatically > using a given interpreter because of bindin defined in the registry. > There is no simple way to add per-file info afaik. You can edit the action used to launch .py files on double click by going into View->Options->File Types in Windows Explorer (that location may not be exactly correct - my Windows box isn't switched on at the moment). Or, assuming an environment variable is supported (ala PYTHONINSPECT vs the -i switch), you could just set that environment variable to allow any character. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From jimjjewett at gmail.com Sat May 26 18:39:57 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 26 May 2007 12:39:57 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5/26/07, Stephen J. Turnbull wrote: > Jim Jewett writes: > > So long as we allow tailoring, I think the maximal set should be > > generous -- and I don't see any reason to pre-exclude anything > > outside ASCII. > Cf characters? Are we admitting "stupid bidi tricks", too? If Tomer needs them. Seriously, I wouldn't put Cf characters in the default accepted tabled. (But remember that *I* would limit that default to ASCII.) Tomer suggested that bidi characters might be needed to get Hebrew and Arabic working correctly. Given that someone has already decided to use Arabic (or even Arabic presentational forms), he or she is better placed to decide whether Cf characters are needed too. > But I'll tell you what my reason is: we want to be in a position to > avoid prohibiting previously acceptable characters wherever possible. Agreed; but in my opinion, the decision to allow those characters is local; the decision to rescind them would therefore also be local. We do want to avoid retracting characters from the default set. (And again, if we restrict that default set to ASCII, we'll be fine.) > > There are people who like to use names like "Program Files" or > > "Summary of Results.Apr-3-2007 version 2.xls"; I expect the same will > > be true of identifiers. So long as the punctuation is not ASCII, we > > might as well let them. > Why not let them use ASCII punctuation, as long as it's not Python > syntax? Because there really isn't any unreserved ASCII punctuation. One issue with @decorators was that it caused some hassle for (reasonably well-known) third-party tools which had been using the "@" character. It would make perfect sense to me if the consensus French table excluded guillemots. But I figure that should be their decision. > > The other committees say to exclude certain scripts, like > > Linear B and Ogham. (I should probably have noted that Linear B and Ogham are not used by any modern language; I *think* the excluded scripts were all for things that would not represent anyone's primary script or mother tongue.) > > If unicode comes out with a new revision, the new characters should > > probably be allowed; I don't want a situation where users of Cham or > > Lepcha[1] are told they have to wait another year because their > > scripts weren't formally adopted into unicode until after python 3.4.0 > > was already released. > Tough call. I'd say, let's cross that bridge when we come to it. > In any case there will have to be some mechanism to access a Unicode > database at either build time or run time. Let them munge that > database if they're in a hurry. I had been thinking of the unicode version as a feature that didn't change within a python release. Perhaps that is negotiable? > Maybe the way to handle this is to allow private-space characters in > identifiers as an option. That would be doable with your well-known > file scheme. But it's very dangerous across modules. It turns out that page was out of date; Lepcha and Cham now have code points which haven't been formally approved, but aren't likely to change. Officially, they're still undefined, but using private-space probably isn't the right answer. So either we allow these particular "undefined" characters, or we (for now) disallow Lepcha and Cham. -jJ From foom at fuhm.net Sat May 26 21:49:30 2007 From: foom at fuhm.net (James Y Knight) Date: Sat, 26 May 2007 15:49:30 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <87ps4p3zot.fsf@uwakimon.sk.tsukuba.ac.jp> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <781A2C3C-011E-4048-A72A-BE631C0C5127@fuhm.net> <87ps4p3zot.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <9D017904-5A64-40EC-8A5C-23502FB1E314@fuhm.net> On May 25, 2007, at 7:33 AM, Stephen J. Turnbull wrote: >> Adding baroque command line options for users of other languages to >> do some useless verification at import time is not an acceptable >> answer. It'd be better to just reject the PEP entirely. > > Speaking of exaggeration .... I am serious. I fully support python having unicode identifier support. But I believe it would be far worse for Python to have complicated identifier syntax configuration via command line options or auxilliary files than to stay restricted to ASCII. If the identifier syntax is changed to include unicode, all python modules are still usable everywhere. Once you start going down the road of configurable syntax (worse: globally configurable syntax), there will be a "second class" of python modules that won't work on some systems without extra pain. I'm listening to all these proposals for options, and it's just getting *worse and worse*. It started with a simple "-U", grew into a "-U ", grew into a 'pyidchar.txt' file with a list of character ranges, and now that pyidchar.txt file is going to have separate sections based on module name? Sorry, but are you !@# kidding me?!? James From timothy.c.delaney at gmail.com Sat May 26 23:04:02 2007 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Sun, 27 May 2007 07:04:02 +1000 Subject: [Python-3000] [Python-Dev] PEP 367: New Super References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <017d01c79e98$c6b84090$0201a8c0@mshome.net> <001d01c79f15$f0afa140$0201a8c0@mshome.net> <002d01c79f6d$ce090de0$0201a8c0@mshome.net> Message-ID: <003f01c79fd9$66948ec0$0201a8c0@mshome.net> Guido van Rossum wrote: > Quick, since I'm about to hop on a plane: Thinking about it again, > storing the super instance in the bound method object is fine, as long > as you only do it when the bound function needs it. Using an unbound > super object in an unbound method is also fine. OTOH, I've got a counter argument to storing the super object - we don't want to create a permantent cycle. If we store the class, we can store it as a weakref - the when the super object is created, a strong reference to the class exists. We can't store a weakref to the super instance though, as there won't be any other reference to it. I still quite like the idea of im_super though, but it would need to be a property instead of just a reference. I also agree with Jim that exposing the class object is useful e.g. for introspection. So I propose the following: 1. Internal weakref to class object. 2. im_type - property that returns a strong ref to the class object. I went through several names before coming up with im_type (im_staticclass, im_classobj, im_classobject, im_boundclass, im_bindingclass). I think im_type conveys exactly what we want this attribute to represent - the class/type that this method was defined in. im_class would have also been suitable, but has had another, different meaning since 2.2. 3. im_super - property that returns the unbound super object (for an unbound method) and bound super object (for a bound method). Tim Delaney > On 5/26/07, Tim Delaney wrote: >> Guido van Rossum wrote: >> >>>>> - Why not make super a keyword, instead of just prohibiting >>>>> assignment to it? (I'm planning to do the same with None BTW in >>>>> Py3k -- I find the "it's a name but you can't assign to it" a >>>>> rather silly business and hardly "the simplest solution".) >>>> >>>> That's currently an open issue - I'm happy to make it a keyword - >>>> in which case I think the title should be changed to "super as a >>>> keyword" or something like that. >>> >>> As it was before. :-) >>> >>> What's the argument against? >> >> I don't see any really, especially if None is to become a true >> keyword. But some people have raised objections. >> >>>> Th preamble will only be added to functions/methods that cause the >>>> 'super' cell to exist i.e. for CPython have 'super' in co.cellvars. >>>> Functions that just have 'super' in co.freevars wouldn't have the >>>> preamble. >>> >>> I think it's still too vague. For example: >>> >>> class C: >>> def f(s): >>> return 1 >>> class D(C): >>> pass >>> def f(s): >>> return 2*super.f() >>> D.f = f >>> print(D().f()) >>> >>> Should that work? I would be okay if it didn't, and if the super >>> keyword is only allowed inside a method that is lexically inside a >>> class. Then the second definition of f() should be a (phase 2) >>> SyntaxError. >> >> That would simplify things. I'll update the PEP. >> >>> Was it ever decided whether the implicitly bound class should be: >>> >>> - the class object as produced by the class statement (before >>> applying class decorators); >>> - whatever is returned by the last class decorator (if any); or >>> - whatever is bound to the class name at the time the method is >>> invoked? >>> I've got a hunch that #1 might be more solid; #3 seems asking for >>> trouble. >> >> I think #3 is definitely the wrong thing to do, but there have been >> arguments put forwards for both #1 and #2. >> >> I think I'll put it as an open issue for now. >> >>> There's also the issue of what to do when the method itself is >>> decorated (the compiler can't know what the decorators mean, even >>> for built-in decorators like classmethod). >> >> I think that may be a different issue. If you do something like: >> >> class A: >> @decorator >> def func(self): >> pass >> >> class B(A): >> @decorator >> def func(self): >> super.func() >> >> then `super.func()` will call whatever `super(B, self).func()` would >> now, which (I think) would result in calling the decorated function. >> >> However, I think the staticmethod decorator would need to be able to >> modify the class instance that's held by the method. Or see my >> proposal below ... >>> We could make the class in question a fourth attribute of the >>> (poorly named) "bound method" object, e.g. im_class_for_super >>> (im_super would be confusing IMO). Since this is used both by >>> instance methods and by the @classmethod decorator, it's just about >>> perfect for this purpose. (I would almost propose to reuse im_self >>> for this purpose, but that's probably asking for subtle backwards >>> incompatibilities and not worth it.) >> >> I'm actually thinking instead that an unbound method should >> reference an unbound super instance for the appropriate class - >> which we could then call im_super. >> >> For a bound instance or class method, im_super would return the >> appropriate bound super instance. In practice, it would work like >> your autosuper recipe using __super. >> >> e.g. >> >> class A: >> def func(self): >> pass >> >>>>> print A.func.im_super >> , NULL> >> >>>>> print A().func.im_super >> , > >> >>> See my proposal above. It differs slightly in that the __super__ >>> call is made only when the class is not NULL. On the expectation >>> that a typical function that references super uses it exactly once >>> per call (that would be by far the most common case I expect) this >>> is just fine. In my proposal the 'super' variable contains whatever >>> __super__(, ) returned, rather than which you >>> seem to be proposing here. >> >> Think I must have been explaining poorly - if you look at the >> reference implementation in the PEP, you'll see that that's exactly >> what's held in the 'super' free variable. >> >> I think your proposal is basically what I was trying to convey - >> I'll look at rewording the PEP so it's less ambiguous. But I'd like >> your thoughts on the above proposal to keep a reference to the >> actual super object rather than the class. >> >> Cheers, >> >> Tim Delaney From baptiste13 at altern.org Sun May 27 00:29:26 2007 From: baptiste13 at altern.org (Baptiste Carvello) Date: Sun, 27 May 2007 00:29:26 +0200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <9D017904-5A64-40EC-8A5C-23502FB1E314@fuhm.net> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <781A2C3C-011E-4048-A72A-BE631C0C5127@fuhm.net> <87ps4p3zot.fsf@uwakimon.sk.tsukuba.ac.jp> <9D017904-5A64-40EC-8A5C-23502FB1E314@fuhm.net> Message-ID: James Y Knight a ?crit : > there will be a "second class" of python modules that won't work on > some systems without extra pain. > modules using unicode identifier *will be* second class anyway, because most people won't be able to debug them in case of need. However, this does not matter for teaching and for in-house code, which are the most compelling use cases of the new feature. BC From showell30 at yahoo.com Sun May 27 00:53:59 2007 From: showell30 at yahoo.com (Steve Howell) Date: Sat, 26 May 2007 15:53:59 -0700 (PDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: Message-ID: <73543.28835.qm@web33502.mail.mud.yahoo.com> --- Baptiste Carvello wrote: > However, this does not > matter for teaching and for in-house code, which are > the most compelling use > cases of the new feature. > For the teaching use case, I'm wondering if the English keywords would already present too high a barrier for students who don't have first-semester familiarity with English. In this example below, altered from Chapter 4 of the tutorial, I have tried to make the keywords appear foreign to an English user, so that an English-speaking person could imagine the opposite scenario. fed ask_ok(prompt, retries=4, complaint='Yes or no, please!'): elihw Eurt: ok = tupni_war(prompt) fi ok ni ('y', 'ye', 'yes'): nruter Eurt fi ok ni ('n', 'no', 'nop', 'nope'): nruter Eslaf retries = retries - 1 fi retries < 0: esiar ROrreoi, 'refusenik user' tnirp complaint To truly enable Python in a non-English teaching environment, I think you'd actually want to go a step further and just internationalize the whole program. ____________________________________________________________________________________Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more. http://mobile.yahoo.com/go?refer=1GNXIC From showell30 at yahoo.com Sun May 27 01:42:46 2007 From: showell30 at yahoo.com (Steve Howell) Date: Sat, 26 May 2007 16:42:46 -0700 (PDT) Subject: [Python-3000] some stats on identifiers (PEP 3131) Message-ID: <601195.60246.qm@web33514.mail.mud.yahoo.com> Here is a survey of some Python code to see how often tokens typically get used in Python 2. Here is the program I used to count the tokens, if you want to try it out on your own in-house codebase: import tokenize import sys fn = sys.argv[1] g = tokenize.generate_tokens(open(fn).readline) dct = {} for tup in g: if tup[0] == 1: identifier = tup[1] dct[identifier] = dct.get(identifier, 0) + 1 identifiers = dct.keys() identifiers.sort() for identifier in identifiers: print '%4d' % dct[identifier], identifier The top 15 in gettext.py: ssslily> python2.5 count.py /usr/local/lib/python2.5/gettext.py | sort -rn | head -15 98 self 73 if 69 return 39 def 35 msgid1 34 tmsg 33 n 33 None 32 domain 31 message 29 msgid2 28 _fallback 21 else 20 locale 20 in The top 15 in an in-house program that deals with an American-based format for sending financial transactions (closest thing I could find to Dutch tax law): 23 trackData 19 ErrorMessages 18 rest 16 cuts 12 encryptedPin 11 return 10 request 10 p2 10 p1 10 maskedMessage 10 j 10 in 10 i 9 len 9 ccNum ____________________________________________________________________________________Choose the right car based on your needs. Check out Yahoo! Autos new Car Finder tool. http://autos.yahoo.com/carfinder/ From python at zesty.ca Sun May 27 03:19:50 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Sat, 26 May 2007 20:19:50 -0500 (CDT) Subject: [Python-3000] PEP 3131 accepted In-Reply-To: References: <20070523111704.85FC.JCARLSON@uci.edu> <87abvu5yfu.fsf@uwakimon.sk.tsukuba.ac.jp> <20070524082737.862E.JCARLSON@uci.edu> <4655DD4E.3050809@v.loewis.de> <4656129D.5000406@v.loewis.de> Message-ID: On Sat, 26 May 2007, Michael Urman wrote: > On 5/26/07, Ka-Ping Yee wrote: > > But the enabling of UTF-8 by a BOM at the > > beginning of the file is an invisible override. This invisible > > override is the source of the danger. If we want to be able to > > read the coding declaration with any confidence, we should get rid > > of the invisible override. > > Do we need to reconsider PEP 3120 "Using UTF-8 as the default source > encoding"? I don't see much difference between not knowing on visual > inspection whether: > allowed is allowed > or > "allowed" == "allowed" The concern is similar in nature, but there is a difference. It is more feasible to tell programmers not to trust the visual appearance of strings than to tell them not to trust the visual appearance of identifiers. Strings are data, which makes them separable from the structure and logic of a program, whereas identifiers are fundamental to all programs. Programmers are already trained to understand that string literals in source code are non-verbatim representations (e.g. "it's" == 'it\'s' == 'it' "'s" == "\x69t's"), whereas they have a well established expectation that identifiers are written verbatim. As long as you have a way of distinguishing strings reliably from the rest of the source code, you can know whether your confidence is well placed. Blake's example illustrates that ambiguity in strings is especially dangerous because it can obscure where strings begin and end. PEP 3120 is problematic. At the very least, it is definitely missing a section addressing objections (the problem of not being able to understand an expression like "allowed" == "allowed") and a section on security considerations (like those raised by Blake's example). Since that the default encoding is currently ASCII, almost all Python programmers are unlikely to be prepared for ambiguity in strings; thus the best thing to do would be to keep the default as ASCII and require a visible declaration to activate such ambiguity (enable UTF-8). Failing that, the next best thing to do would be to forbid all confusable characters without an explicit declaration to permit them. And the next best thing after that would be to forbid just the characters that are confusable with the delimiters that fence off ambiguous text (' " #) without an explicit declaration to permit them. -- ?!ng From ncoghlan at gmail.com Sun May 27 05:28:59 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 May 2007 13:28:59 +1000 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <788141.82125.qm@web33507.mail.mud.yahoo.com> Message-ID: <4658FAFB.7010204@gmail.com> Mike Klaas wrote: > On 25-May-07, at 6:03 AM, Steve Howell wrote: > >> We're just disagreeing about whether the Dutch tax law >> programmer has to uglify his environment with an alias >> of Python to "python3.0 -liberal_unicode," or whether >> the American programmer in an enterprisy environment >> has to uglify his environment with an alias of Python >> to "python3.0 -parochial" to mollify his security >> auditors. > > Surely if such mollification were necessary, -parochial would be > routinely used for (most much enterprise-y) java? I have never seen > any such thing done, though my experience is perhaps not universal. Java (and C#) are statically typed - a simple assignment statement can't introduce a new variable, so the issue of deceptive assignment provides far less opportunity for mischief. A Java or C# equivalent of KPY's deceptive code would either fail to compile with an unrecognised identifier error when it encountered the undeclared 'allow-with-Cyrillic-a' identifier (or else it have an extra apparently redundant identifier declaration). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From ryan.freckleton at gmail.com Sun May 27 09:19:20 2007 From: ryan.freckleton at gmail.com (Ryan Freckleton) Date: Sun, 27 May 2007 01:19:20 -0600 Subject: [Python-3000] Composable abstract base class? Message-ID: <318072440705270019u5c66ff5u54732c429d4beca8@mail.gmail.com> I've been following the python-dev and python 3000 lists for over a year, but this is my first posting. I think I've found additional abstract base class to add to PEP 3119. An ABC for composable data (e.g. list, tuple, set, and perhaps dict) to inherit from. An composable object can contain instances of other composable objects. In other words, a composable object can be used as the outer container in a nested data structure. The motivating example is when you want to recurse through a nested list of strings, e.g. >>> seq = ['my', 'hovercraft', ['is', 'full', 'of', ['eels']]] >>> def recurse(sequence): if isinstance(sequence, list): for child in sequence: recurse(child) else: print sequence >>> recurse(seq) my hovercraft is full of eels You could solve this by the composite pattern, but I think that using an ABC may be simpler. If we had a Composable ABC that set, list and tuple inherit, the above code could be written as: def recurse(sequence): if isinstance(sequence, Composoble): for child in sequence: recurse(child) else: print sequence Which is much more general. This could be easily introduced by a third party developer, using the mechanisms outlined in the PEP, the question is: would it be worthwhile to add this ABC to PEP 3119? If it was added to to PEP 3119, I believe that it should be a subtype of Container. I do not think it should inherit from Iterable, since it is possible for a container types to not support the iterator protocol, but still support composition. Sincerely, -- ===== --Ryan E. Freckleton From python at zesty.ca Sun May 27 10:18:49 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Sun, 27 May 2007 03:18:49 -0500 (CDT) Subject: [Python-3000] Composable abstract base class? In-Reply-To: <318072440705270019u5c66ff5u54732c429d4beca8@mail.gmail.com> References: <318072440705270019u5c66ff5u54732c429d4beca8@mail.gmail.com> Message-ID: On Sun, 27 May 2007, Ryan Freckleton wrote: > I've been following the python-dev and python 3000 lists for over a > year, but this is my first posting. Hello! > I think I've found additional abstract base class to add to PEP 3119. > An ABC for composable data (e.g. list, tuple, set, and perhaps dict) > to inherit from. An composable object can contain instances of other > composable objects. In other words, a composable object can be used as > the outer container in a nested data structure. [...] > def recurse(sequence): > if isinstance(sequence, Composoble): > for child in sequence: > recurse(child) > else: > print sequence I think I understand your example, but I don't understand what makes it necessary to introduce an ABC for Composable as separate from Iterable. What is intended to be different about Composable? Can you provide a usage example for Composable where Iterable would not be sufficient? -- ?!ng From guido at python.org Sun May 27 11:29:08 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 27 May 2007 02:29:08 -0700 Subject: [Python-3000] Composable abstract base class? In-Reply-To: References: <318072440705270019u5c66ff5u54732c429d4beca8@mail.gmail.com> Message-ID: On 5/27/07, Ka-Ping Yee wrote: > On Sun, 27 May 2007, Ryan Freckleton wrote: > > I've been following the python-dev and python 3000 lists for over a > > year, but this is my first posting. > > Hello! Hello too! > > I think I've found additional abstract base class to add to PEP 3119. > > An ABC for composable data (e.g. list, tuple, set, and perhaps dict) > > to inherit from. An composable object can contain instances of other > > composable objects. In other words, a composable object can be used as > > the outer container in a nested data structure. > [...] > > def recurse(sequence): > > if isinstance(sequence, Composoble): > > for child in sequence: > > recurse(child) > > else: > > print sequence > > I think I understand your example, but I don't understand what makes > it necessary to introduce an ABC for Composable as separate from > Iterable. What is intended to be different about Composable? Can > you provide a usage example for Composable where Iterable would not > be sufficient? Ryan is repeating the classic flatten example: strings are iterables but shouldn't be iterated over in this example. This is more the domain of Generic Functions, PEP 3124. Anyway, the beauty of PEP 3119 is that even if PEP 3124 were somehow rejected, you could add Composable yourself, and there is no requirement to add it (or any other category you might want to define) to the "standard" set of ABCs. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun May 27 11:59:45 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 27 May 2007 02:59:45 -0700 Subject: [Python-3000] [Python-Dev] PEP 367: New Super In-Reply-To: <003f01c79fd9$66948ec0$0201a8c0@mshome.net> References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <017d01c79e98$c6b84090$0201a8c0@mshome.net> <001d01c79f15$f0afa140$0201a8c0@mshome.net> <002d01c79f6d$ce090de0$0201a8c0@mshome.net> <003f01c79fd9$66948ec0$0201a8c0@mshome.net> Message-ID: On 5/26/07, Tim Delaney wrote: > Guido van Rossum wrote: > > Quick, since I'm about to hop on a plane: Thinking about it again, > > storing the super instance in the bound method object is fine, as long > > as you only do it when the bound function needs it. Using an unbound > > super object in an unbound method is also fine. > > OTOH, I've got a counter argument to storing the super object - we don't > want to create a permantent cycle. The bound method object isn't stored in the class -- it's created by the "C.method" or "inst.method" getattr operation. I don't see how this would introduce a cycle. > If we store the class, we can store it as a weakref - the when the super > object is created, a strong reference to the class exists. > > We can't store a weakref to the super instance though, as there won't be any > other reference to it. > > I still quite like the idea of im_super though, but it would need to be a > property instead of just a reference. > > I also agree with Jim that exposing the class object is useful e.g. for > introspection. > > So I propose the following: > > 1. Internal weakref to class object. > > 2. im_type - property that returns a strong ref to the class object. > > I went through several names before coming up with im_type (im_staticclass, > im_classobj, im_classobject, im_boundclass, im_bindingclass). I think > im_type conveys exactly what we want this attribute to represent - the > class/type that this method was defined in. > > im_class would have also been suitable, but has had another, different > meaning since 2.2. Since class and type are synonym (as you say) having both im_class and in_type would be a bad idea. > 3. im_super - property that returns the unbound super object (for an unbound > method) and bound super object (for a bound method). > > Tim Delaney > > > On 5/26/07, Tim Delaney wrote: > >> Guido van Rossum wrote: > >> > >>>>> - Why not make super a keyword, instead of just prohibiting > >>>>> assignment to it? (I'm planning to do the same with None BTW in > >>>>> Py3k -- I find the "it's a name but you can't assign to it" a > >>>>> rather silly business and hardly "the simplest solution".) > >>>> > >>>> That's currently an open issue - I'm happy to make it a keyword - > >>>> in which case I think the title should be changed to "super as a > >>>> keyword" or something like that. > >>> > >>> As it was before. :-) > >>> > >>> What's the argument against? > >> > >> I don't see any really, especially if None is to become a true > >> keyword. But some people have raised objections. > >> > >>>> Th preamble will only be added to functions/methods that cause the > >>>> 'super' cell to exist i.e. for CPython have 'super' in co.cellvars. > >>>> Functions that just have 'super' in co.freevars wouldn't have the > >>>> preamble. > >>> > >>> I think it's still too vague. For example: > >>> > >>> class C: > >>> def f(s): > >>> return 1 > >>> class D(C): > >>> pass > >>> def f(s): > >>> return 2*super.f() > >>> D.f = f > >>> print(D().f()) > >>> > >>> Should that work? I would be okay if it didn't, and if the super > >>> keyword is only allowed inside a method that is lexically inside a > >>> class. Then the second definition of f() should be a (phase 2) > >>> SyntaxError. > >> > >> That would simplify things. I'll update the PEP. > >> > >>> Was it ever decided whether the implicitly bound class should be: > >>> > >>> - the class object as produced by the class statement (before > >>> applying class decorators); > >>> - whatever is returned by the last class decorator (if any); or > >>> - whatever is bound to the class name at the time the method is > >>> invoked? > >>> I've got a hunch that #1 might be more solid; #3 seems asking for > >>> trouble. > >> > >> I think #3 is definitely the wrong thing to do, but there have been > >> arguments put forwards for both #1 and #2. > >> > >> I think I'll put it as an open issue for now. > >> > >>> There's also the issue of what to do when the method itself is > >>> decorated (the compiler can't know what the decorators mean, even > >>> for built-in decorators like classmethod). > >> > >> I think that may be a different issue. If you do something like: > >> > >> class A: > >> @decorator > >> def func(self): > >> pass > >> > >> class B(A): > >> @decorator > >> def func(self): > >> super.func() > >> > >> then `super.func()` will call whatever `super(B, self).func()` would > >> now, which (I think) would result in calling the decorated function. > >> > >> However, I think the staticmethod decorator would need to be able to > >> modify the class instance that's held by the method. Or see my > >> proposal below ... > >>> We could make the class in question a fourth attribute of the > >>> (poorly named) "bound method" object, e.g. im_class_for_super > >>> (im_super would be confusing IMO). Since this is used both by > >>> instance methods and by the @classmethod decorator, it's just about > >>> perfect for this purpose. (I would almost propose to reuse im_self > >>> for this purpose, but that's probably asking for subtle backwards > >>> incompatibilities and not worth it.) > >> > >> I'm actually thinking instead that an unbound method should > >> reference an unbound super instance for the appropriate class - > >> which we could then call im_super. > >> > >> For a bound instance or class method, im_super would return the > >> appropriate bound super instance. In practice, it would work like > >> your autosuper recipe using __super. > >> > >> e.g. > >> > >> class A: > >> def func(self): > >> pass > >> > >>>>> print A.func.im_super > >> , NULL> > >> > >>>>> print A().func.im_super > >> , > > >> > >>> See my proposal above. It differs slightly in that the __super__ > >>> call is made only when the class is not NULL. On the expectation > >>> that a typical function that references super uses it exactly once > >>> per call (that would be by far the most common case I expect) this > >>> is just fine. In my proposal the 'super' variable contains whatever > >>> __super__(, ) returned, rather than which you > >>> seem to be proposing here. > >> > >> Think I must have been explaining poorly - if you look at the > >> reference implementation in the PEP, you'll see that that's exactly > >> what's held in the 'super' free variable. > >> > >> I think your proposal is basically what I was trying to convey - > >> I'll look at rewording the PEP so it's less ambiguous. But I'd like > >> your thoughts on the above proposal to keep a reference to the > >> actual super object rather than the class. > >> > >> Cheers, > >> > >> Tim Delaney > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at gmail.com Sun May 27 12:18:45 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 May 2007 20:18:45 +1000 Subject: [Python-3000] Composable abstract base class? In-Reply-To: References: <318072440705270019u5c66ff5u54732c429d4beca8@mail.gmail.com> Message-ID: <46595B05.8080301@gmail.com> Guido van Rossum wrote: > Ryan is repeating the classic flatten example: strings are iterables > but shouldn't be iterated over in this example. This is more the > domain of Generic Functions, PEP 3124. Anyway, the beauty of PEP 3119 > is that even if PEP 3124 were somehow rejected, you could add > Composable yourself, and there is no requirement to add it (or any > other category you might want to define) to the "standard" set of > ABCs. I think this is an interesting example to flesh out though - how would I express that most instances of Iterable should be iterated over when being Flattened, but that certain instances of Iterable (i.e. strings) should be ignored? For example, it would be nice to be able to write: from abc import Iterable class Flattenable(Iterable): pass Flattenable.deregister(basestring) Reading the PEP as it stands, I believe carving out exceptions like this would require either subclassing ABCMeta to change the behaviour, or else relying on PEP 3124 or some other generic function mechanism. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From timothy.c.delaney at gmail.com Sun May 27 13:09:22 2007 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Sun, 27 May 2007 21:09:22 +1000 Subject: [Python-3000] [Python-Dev] PEP 367: New Super References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <017d01c79e98$c6b84090$0201a8c0@mshome.net> <001d01c79f15$f0afa140$0201a8c0@mshome.net> <002d01c79f6d$ce090de0$0201a8c0@mshome.net> <003f01c79fd9$66948ec0$0201a8c0@mshome.net> Message-ID: <009c01c7a04f$7e348460$0201a8c0@mshome.net> Guido van Rossum wrote: > The bound method object isn't stored in the class -- it's created by > the "C.method" or "inst.method" getattr operation. I don't see how > this would introduce a cycle. > >> If we store the class, we can store it as a weakref - the when the >> super object is created, a strong reference to the class exists. We need to create some relationship between the unbound method and the class. So the class has a reference to the unbound method, and the unbound method has a reference to the class, thus creating a cycle. Bound methods don't come into it - it's the unbound method that's the problem. > Since class and type are synonym (as you say) having both im_class and > im_type would be a bad idea. I'm struggling to think of another, not too complicated name that conveys the same information. Tim Delaney From stephen at xemacs.org Sun May 27 14:59:08 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 27 May 2007 21:59:08 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <9D017904-5A64-40EC-8A5C-23502FB1E314@fuhm.net> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <464FFD04.90602@v.loewis.de> <46521CD7.9030004@v.loewis.de> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <781A2C3C-011E-4048-A72A-BE631C0C5127@fuhm.net> <87ps4p3zot.fsf@uwakimon.sk.tsukuba.ac.jp> <9D017904-5A64-40EC-8A5C-23502FB1E314@fuhm.net> Message-ID: <87wsyu2zj7.fsf@uwakimon.sk.tsukuba.ac.jp> James Y Knight writes: > If the identifier syntax is changed to include unicode, all python > modules are still usable everywhere. Once you start going down the > road of configurable syntax (worse: globally configurable syntax), The syntax is not "configured", it is "audited". Just like Unix passwords, which can be anything in principle, but most distros audit them (unless assigned by root). Now, Ka-Ping Yee and Josiah Carlson clearly would like to see the restriction in the language. That's not where I'm going. I see PEP 3131 as defining the language. However, I do think that a limited amount of *optional* auditing *in the Python compiler* would be a good idea to have, especially for Americans who (along with everybody else) have *no* need for Unicode identifier support now, and are not going to have a need for a long time on average. Better they should get a heads-up when the Klingons arrive. > there will be a "second class" of python modules that won't work on > some systems without extra pain. That's right. It's all modules that contain non-ASCII identifiers, because by PEP 3131 they cannot be distributed with Python as part of the standard library. The question is how much extra pain, and will it actually hinder u > It started with a simple "-U", grew into a "-U ", grew into Actually, it started with plugging into the codec interface, with "ASCII-only" and "PEP 3131" auditors available by default. > a 'pyidchar.txt' file with a list of character ranges, and now that > pyidchar.txt file is going to have separate sections based on module > name? Sorry, but are you !@# kidding me?!? The scalability issue was raised by Guido, not the ASCII advocates. To answer how I view this, no, I'm not kidding. Until the vaporware auditing programs get fieldtested, and we've actually seen a couple of exploits of unwary sites and discover that they're the ones the auditing programs already catch, not something unexpected. In any case, I expect that the most commonly used version of that file will look like [DEFAULT] 000000-1FFFFF # all of Unicode as restricted by PEP 3131 # pyidchar.txt ends here Anything more complicated than that is a convenient standardized format for filters that can be shared among the seriously paranoid. From guido at python.org Sun May 27 14:50:47 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 27 May 2007 05:50:47 -0700 Subject: [Python-3000] [Python-Dev] PEP 367: New Super In-Reply-To: <009c01c7a04f$7e348460$0201a8c0@mshome.net> References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <017d01c79e98$c6b84090$0201a8c0@mshome.net> <001d01c79f15$f0afa140$0201a8c0@mshome.net> <002d01c79f6d$ce090de0$0201a8c0@mshome.net> <003f01c79fd9$66948ec0$0201a8c0@mshome.net> <009c01c7a04f$7e348460$0201a8c0@mshome.net> Message-ID: On 5/27/07, Tim Delaney wrote: > Guido van Rossum wrote: > > > The bound method object isn't stored in the class -- it's created by > > the "C.method" or "inst.method" getattr operation. I don't see how > > this would introduce a cycle. > > > >> If we store the class, we can store it as a weakref - the when the > >> super object is created, a strong reference to the class exists. > > We need to create some relationship between the unbound method and the > class. So the class has a reference to the unbound method, and the unbound > method has a reference to the class, thus creating a cycle. Bound methods > don't come into it - it's the unbound method that's the problem. Still wrong, I think. The unbound method object *also* isn't stored in the class. It's returned by the C.method operation. Compare C.method (which returns an unbound method) to C.__dict__['method'] (which returns the actual function object stored in the class). > > Since class and type are synonym (as you say) having both im_class and > > im_type would be a bad idea. > > I'm struggling to think of another, not too complicated name that conveys > the same information. Keep trying. im_type is not acceptable. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun May 27 14:57:15 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 27 May 2007 05:57:15 -0700 Subject: [Python-3000] Composable abstract base class? In-Reply-To: <46595B05.8080301@gmail.com> References: <318072440705270019u5c66ff5u54732c429d4beca8@mail.gmail.com> <46595B05.8080301@gmail.com> Message-ID: On 5/27/07, Nick Coghlan wrote: > Guido van Rossum wrote: > > Ryan is repeating the classic flatten example: strings are iterables > > but shouldn't be iterated over in this example. This is more the > > domain of Generic Functions, PEP 3124. Anyway, the beauty of PEP 3119 > > is that even if PEP 3124 were somehow rejected, you could add > > Composable yourself, and there is no requirement to add it (or any > > other category you might want to define) to the "standard" set of > > ABCs. > > I think this is an interesting example to flesh out though - how would I > express that most instances of Iterable should be iterated over when > being Flattened, but that certain instances of Iterable (i.e. strings) > should be ignored? > > For example, it would be nice to be able to write: > > from abc import Iterable > > class Flattenable(Iterable): > pass > > Flattenable.deregister(basestring) > > > Reading the PEP as it stands, I believe carving out exceptions like this > would require either subclassing ABCMeta to change the behaviour, or > else relying on PEP 3124 or some other generic function mechanism. You can't do it with the existing ABC class, but you could do it by overriding __subclasscheck__ in a different way. But it's definitely much easier to do with GFs. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stephen at xemacs.org Sun May 27 16:03:59 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 27 May 2007 23:03:59 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <46527904.1000202@v.loewis.de> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87veee2wj4.fsf@uwakimon.sk.tsukuba.ac.jp> Jim Jewett writes: > > Cf characters? Are we admitting "stupid bidi tricks", too? > > If Tomer needs them. But that's what I mean by respecting the work of the Unicode technical committees. They say he *doesn't* need them, no matter what he thinks. They do make mistakes. But they are far less likely to make mistakes than a non-specialist native speaker. > Seriously, I wouldn't put Cf characters in the default accepted > tabled. (But remember that *I* would limit that default to ASCII.) It's not the default that matters. It's what actually gets used that matters. If we start by saying "you can't have these characters" and the users thumb their noses at us, OK, we made a mistake and we fix it to correspond to what the users actually have shown to be BCP. If we start by saying "you can have any characters you want", I'm pretty sure we're making a mistake, and if so, we can't fix it any more than we can get rid of Reply-To munging. > Agreed; but in my opinion, the decision to allow those characters is > local; the decision to rescind them would therefore also be local. It is not a local decision, not in PEP 3131. PEP 3131 clearly intends to conform to UAX #31. (I think it still needs to *explicitly* state that it's defining a profile of UAX #31, since there are restrictions on ASCII identifier characters in Python that are not in the basic definitions of UAX #31.) Your proposal would return PEP 3131 to a blank sheet of paper, and ensure non-conformance with an important normative Annex of Unicode. > I had been thinking of the unicode version as a feature that didn't > change within a python release. Perhaps that is negotiable? I think it's a bad idea to allow it to change within a release. All I meant was that there could be a well-known mechanism for using different tables, either at run-time or at compile-time, so that users could change it if they want to. People who need Lepcha and Cham and want to have a Python that uses unapproved code points for them will have to use a Python which is not conformant. Let them, of course, but I don't see why the 6 billion potential Python users who have never heard of Lepcha, Cham, or the "IBM corporate extension character set for Japanese" should need to forego Unicode conformance as well. > > Maybe the way to handle this is to allow private-space characters in > > identifiers as an option. That would be doable with your well-known > > file scheme. But it's very dangerous across modules. > > It turns out that page was out of date; Lepcha and Cham now have code > points which haven't been formally approved, but aren't likely to > change. Officially, they're still undefined, but using private-space > probably isn't the right answer. So either we allow these particular > "undefined" characters, or we (for now) disallow Lepcha and Cham. The law of the excluded middle doesn't apply in that way. It's trivial to "cast" the unofficial code points into "private space" as a block. This technique was used in XEmacs/CHISE (nee XEmacs/UTF-2000) to grandfather the old MULE codes while they filled out the Unicode space, and to map character sets that are not Unicode conformant into Unicode space while preserving collating order and so on. Granted, that's a research extension not a production editor, but the technique seems to work pretty well for the people who need such things. Any Python code that doesn't assume a numerical relationship between the Lepcha block and any other block will work unchanged, and implementing the changeover for old versions of Python that don't know about Lepcha simply requires installing a Lepcha compatibility codec to do the trivial mapping. Is that cool or what? The main problem with this technique is that on some platforms you have to be careful about casting into the BMP, because vendors like Microsoft and Apple have a penchant for using a lot of the BMP private space for corporate logos and the like. And I think Klingon is standard on Linux (or has the Unicode consortium approved a Klingon block since I last looked?) From collinw at gmail.com Mon May 28 02:41:34 2007 From: collinw at gmail.com (Collin Winter) Date: Sun, 27 May 2007 17:41:34 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <87veee2wj4.fsf@uwakimon.sk.tsukuba.ac.jp> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> <87veee2wj4.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <43aa6ff70705271741w2b3eefcbj29921e81822d189@mail.gmail.com> On 5/27/07, Stephen J. Turnbull wrote: > Jim Jewett writes: > > > > Cf characters? Are we admitting "stupid bidi tricks", too? > > > > If Tomer needs them. > > But that's what I mean by respecting the work of the Unicode technical > committees. They say he *doesn't* need them, no matter what he thinks. > > They do make mistakes. But they are far less likely to make mistakes > than a non-specialist native speaker. Sincere question: if these characters aren't needed, why are they provided? From what I can tell by googling, they're needed when, e.g., Arabic is embedded in an otherwise left-to-right script. Do I have that right? That sounds pretty close to what you'd get when using Arabic identifiers with the English keywords/stdlib. Collin Winter From stephen at xemacs.org Mon May 28 05:51:46 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 28 May 2007 12:51:46 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <43aa6ff70705271741w2b3eefcbj29921e81822d189@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> <87veee2wj4.fsf@uwakimon.sk.tsukuba.ac.jp> <43aa6ff70705271741w2b3eefcbj29921e81822d189@mail.gmail.com> Message-ID: <87fy5h38rx.fsf@uwakimon.sk.tsukuba.ac.jp> Collin Winter writes: > Sincere question: if these characters aren't needed, why are they > provided? From what I can tell by googling, they're needed when, e.g., > Arabic is embedded in an otherwise left-to-right script. Do I have > that right? That sounds pretty close to what you'd get when using > Arabic identifiers with the English keywords/stdlib. The problem is visual presentation to humans. It's very much like unmarshalling little-endian integers from a byte stream. The byte stream by definition is big-endian, so when you simply memcpy into the stream buffer, little-endian integers will come out in reverse byte order. Bidi works a little bit differently; in principle it works both ways (if you start LTR then the RTL is in reverse order in the stream, and vice versa) since both kinds of script are character streams. But in both cases, *inside* the computer, there is a natural "big-endian" order and the computer does not get confused. That is one sense in which format characters are YAGNIs. Now, identifiers are by definition character streams. If an English speaker would pronounce the spelling of an English word "A B C", and an Arabic speaker an Arabic word as "1 2 3", then *as an identifier* the combination English then Arabic is spelled "A B C _ 1 2 3". And that's all the Python compiler needs to know. In fact, on the editor display this would be presented "ABC_321". In data entry, you'd see something like this key display A A B AB C ABC _ ABC_ 1 ABC_1 2 ABC_21 3 ABC_321 This can be done algorithmically (this is the "Unicode Technical Annex #9", aka "UAX #9", you may have seen references to), to a very high degree approximation to what human typesetters do in bidi cultures. Now suppose you want to see on screen the contents of memory cells as characters. Then you would put into memory something like "A B C _ LRO 1 2 3" where LRO is a control character that says "no matter what directional property has normally, override that with left-to-right until I say otherwise." That logical sequence of characters is indeed displayed "ABC_123". But how about those as identifiers? Note that in memory the sequence of printing characters is "A B C _ 1 2 3" in each case. So it makes sense to think of that as the identifier, *ignoring* the presentation control characters. Suppose we prohibit the directional control characters. Then a Unicode conforming editor will put the characters in logical order "A B C _ 1 2 3" in the file, and display them naturally (to a speaker of Arabic) as "ABC_321". This is going to be by far the most common case, and the user knows that it works this way. I don't see a problem here. Do you? OK, now let's consider the cases of breakage. Consider a malicious author who uses LRO as "A B C _ 1 2 LRO 3" which displays as "ABC_213" (IIRC, I haven't actually tried to implement bidi in a very long time). Can you think of a genuine use for that? I can't; I think it's a bad idea to allow it. On the other hand, you could have a situation where the printed documentation uses the UAX #9 bidi algorithm, and discusses the meaning of the identifier "ABC_321", while the reviewing programmer is using a broken editor which implements overrides but not the algorithm, and sees "ABC_123". So in the case where LRO is permitted, the author can enforce the visual order that the reviewer will see in the documents on both the documents and the editor display. But since it's the unnatural (to an Arabic reader) "ABC_123", it will be confusing and hard to read. Is this a win? As somebody (I think Jim J) pointed out, bidi is a world of pain unless and until *all* editors and readers implement a common set of display conventions. Python can't do anything that will unambiguously reduce that pain. So IMHO it is best to conform to a standard that can be unambiguously implemented, and is likely to be available to the majority of programmers who need to work with bidi environments. That is UAX #31, which mandates ignoring these format characters (in the default profile), and strongly recommends prohibiting them in all profiles. From stephen at xemacs.org Mon May 28 06:08:21 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 28 May 2007 13:08:21 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <43aa6ff70705271741w2b3eefcbj29921e81822d189@mail.gmail.com> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> <87veee2wj4.fsf@uwakimon.sk.tsukuba.ac.jp> <43aa6ff70705271741w2b3eefcbj29921e81822d189@mail.gmail.com> Message-ID: <87d50l380a.fsf@uwakimon.sk.tsukuba.ac.jp> Collin Winter writes: > Sincere question: if these characters aren't needed, why are they > provided? I already gave a long jargony answer, but maybe this analogy is better: Most of the time automatic line-wrapping gives excellent results, but sometimes you need the newline character to achieve special effects (eg, poetry). Directional controls are similar: used for "special effects" that are none-the-less an everyday part of the language. > From what I can tell by googling, they're needed when, e.g., > Arabic is embedded in an otherwise left-to-right script. No, they are unnecessary; there are algorithms that do a fine job -- for most purposes, but not all. It's the exceptions where the control characters are needed. The Unicode technical committee does not think identifiers are exceptional, and they are experts (including Hebrew and Arabic native speakers, I am sure). From alexandre at peadrop.com Mon May 28 18:56:00 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 28 May 2007 12:56:00 -0400 Subject: [Python-3000] Lines breaking Message-ID: Hi, Just wandering. It would be a good idea to make the string methods split() and splitlines() break lines as specified by the Unicode Standard (Section 5.8 Newline Guidelines)? If you don't have a printed copy, you can read the section here: http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf -- Alexandre From daniel at stutzbachenterprises.com Mon May 28 21:41:44 2007 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 28 May 2007 14:41:44 -0500 Subject: [Python-3000] BLists (PEP 3128) Message-ID: On 5/11/07, Raymond Hettinger wrote: > > Would it be useful if I created an experimental fork of 2.5 > > that replaces array-based lists with BLists, > > so that the performance penalty (if any) on existing code > > can be measured? > > That would likely be an informative exercise and would assure that your > code is truly interchangable with regular lists. It would also highlight the > under-the-hood difficulties you'll encounter with the C-API. > > That being said, it is a labor intensive exercise and the time might be better > spent on tweaking the third-party module code and building a happy user-base. Just to provide a quick update on my adventures with BLists: I went forward with the exercise of replacing the array-based list with BLists in the Python interpreter. As a first pass, I went with a simple, not-very-efficient redirect of the List API. I had very few problems getting this working well enough to compile. The exercise also had the benefit that I have been able to test BLists against the entire Python test suite. Previously, I had adapted only test_list. test_builtin was particularly useful. I was able to find and fix a couple more bugs in my implementation this way. Almost all of the tests pass now. There are also a handful of test failures where the tests are asserting the CPython implementation details when the intent is really just to assert(don't crash). These tests are related to when references are deleted to evil comparison/lookup functions or what happens when a list changes size during iteration. I'll probably change the BList code to match CPython's behavior. I have one genuine bug to fix that inexplicably causes test_shlex to fail. Once that is taken care of, I get back to looking at performance. However, I'm leaving tomorrow for 3 weeks (wedding + honeymoon), so I'm not going to be able to make any further progress until I get back. :-) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC From guido at python.org Tue May 29 00:44:31 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 29 May 2007 06:44:31 +0800 Subject: [Python-3000] Lines breaking In-Reply-To: References: Message-ID: Can you or someone supply a patch? Put it in the SourceForge patch manager and post here. OTOH I don't believe that's how 2.x implements these methods, and AFAIK nobody's complained. Is in necessary to change? At the very least I'd be opposed if it changed the behavior of splitting ASCII-only text. --Guido On 5/29/07, Alexandre Vassalotti wrote: > Hi, > > Just wandering. It would be a good idea to make the string methods > split() and splitlines() break lines as specified by the Unicode > Standard (Section 5.8 Newline Guidelines)? > > If you don't have a printed copy, you can read the section here: > http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf > > -- Alexandre > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Tue May 29 01:49:33 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 28 May 2007 19:49:33 -0400 Subject: [Python-3000] Lines breaking In-Reply-To: References: Message-ID: On 5/28/07, Guido van Rossum wrote: > Can you or someone supply a patch? Put it in the SourceForge patch > manager and post here. I can't promise anything, since I am quite busy my SoC project, but I could try to supply a patch, if you and the other developers are in favor for the change. A few other methods would need to be changed too to conform fully to the standard -- I am thinking especially of the file methods readline/readlines. So, the change should probably be documented in a PEP. > OTOH I don't believe that's how 2.x implements these methods, and > AFAIK nobody's complained. Is in necessary to change? At the very > least I'd be opposed if it changed the behavior of splitting > ASCII-only text. The change would extend the line breaking behavior to three other ASCII characters: NEL "Next Line" 85 VT "Vertical Tab" 0B FF "Form Feed" 0C Of course, it is not really necessary to change, but I think full conformance to the standard [1] could give Python better support of multilingual texts. However, full conformance would require a good amount of work. So, it is true that it is probably better to postpone it until someone complaint. -- Alexandre [1] http://www.unicode.org/reports/tr14/tr14-19.html From greg.ewing at canterbury.ac.nz Tue May 29 03:03:48 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 29 May 2007 13:03:48 +1200 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <87fy5h38rx.fsf@uwakimon.sk.tsukuba.ac.jp> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> <87veee2wj4.fsf@uwakimon.sk.tsukuba.ac.jp> <43aa6ff70705271741w2b3eefcbj29921e81822d189@mail.gmail.com> <87fy5h38rx.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <465B7BF4.1000400@canterbury.ac.nz> Stephen J. Turnbull wrote: > If an English > speaker would pronounce the spelling of an English word "A B C", and > an Arabic speaker an Arabic word as "1 2 3", then *as an identifier* > the combination English then Arabic is spelled "A B C _ 1 2 3". But would an Arabic speaker pronounce the identifier as a whole as "A B C 1 2 3" or "1 2 3 A B C"? That's where I find it all gets very confusing. -- Greg From greg.ewing at canterbury.ac.nz Tue May 29 03:26:37 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 29 May 2007 13:26:37 +1200 Subject: [Python-3000] Lines breaking In-Reply-To: References: Message-ID: <465B814D.2060101@canterbury.ac.nz> Alexandre Vassalotti wrote: > The change would extend the line breaking behavior to three other > ASCII characters: > NEL "Next Line" 85 That's not an ASCII character. > VT "Vertical Tab" 0B > FF "Form Feed" 0C -1 on making these line-breaking characters by default. I like my ASCII text file lines broken by newline chars and nothing else. -- Greg From guido at python.org Tue May 29 04:37:47 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 29 May 2007 10:37:47 +0800 Subject: [Python-3000] [Python-Dev] PEP 367: New Super In-Reply-To: References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <001d01c79f15$f0afa140$0201a8c0@mshome.net> <002d01c79f6d$ce090de0$0201a8c0@mshome.net> <003f01c79fd9$66948ec0$0201a8c0@mshome.net> <009c01c7a04f$7e348460$0201a8c0@mshome.net> Message-ID: Hi Tim, I've gone ahead and cooked up a tiny demo patch that uses im_class to store what you called im_type. Because I don't have the parser changes ready yet, this requires you to declare a keyword-only arg named 'super'; this triggers special code that set it to super(im_class, im_self). http://python.org/sf/1727209 I haven't tried to discover yet how much breaks due to the change of semantics for im_class. --Guido On 5/27/07, Guido van Rossum wrote: > On 5/27/07, Tim Delaney wrote: > > Guido van Rossum wrote: > > > > > The bound method object isn't stored in the class -- it's created by > > > the "C.method" or "inst.method" getattr operation. I don't see how > > > this would introduce a cycle. > > > > > >> If we store the class, we can store it as a weakref - the when the > > >> super object is created, a strong reference to the class exists. > > > > We need to create some relationship between the unbound method and the > > class. So the class has a reference to the unbound method, and the unbound > > method has a reference to the class, thus creating a cycle. Bound methods > > don't come into it - it's the unbound method that's the problem. > > Still wrong, I think. The unbound method object *also* isn't stored in > the class. It's returned by the C.method operation. Compare C.method > (which returns an unbound method) to C.__dict__['method'] (which > returns the actual function object stored in the class). > > > > Since class and type are synonym (as you say) having both im_class and > > > im_type would be a bad idea. > > > > I'm struggling to think of another, not too complicated name that conveys > > the same information. > > Keep trying. im_type is not acceptable. :-) > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From turnbull at sk.tsukuba.ac.jp Tue May 29 05:57:23 2007 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 29 May 2007 12:57:23 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <465B7BF4.1000400@canterbury.ac.nz> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> <87veee2wj4.fsf@uwakimon.sk.tsukuba.ac.jp> <43aa6ff70705271741w2b3eefcbj29921e81822d189@mail.gmail.com> <87fy5h38rx.fsf@uwakimon.sk.tsukuba.ac.jp> <465B7BF4.1000400@canterbury.ac.nz> Message-ID: <878xb82sf0.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Stephen J. Turnbull wrote: > > If an English speaker would pronounce the spelling of an English > > word "A B C", and an Arabic speaker an Arabic word as "1 2 3", > > then *as an identifier* the combination English then Arabic is > > spelled "A B C _ 1 2 3". > But would an Arabic speaker pronounce the identifier as a whole > as "A B C 1 2 3" or "1 2 3 A B C"? That's where I find it all > gets very confusing. Then "unask the question." Bidi is *not* context-free; that question is not properly formulated. Pragmatically, in a well-formed Python program in the overwhelming majority of cases you or she will be in a LTR context, so it will be read "A B C _ 1 2 3". The ambiguity you have in mind probably is best expressed "what happens with a single line program?" Eg, one that appears on the display like this: ABC_321 True, a native Arabic speaker would surely (absent any context except her early upbringing) read that "1 2 3 _ A B C". And I admit, I'd read it "A B C _ 1 2 3". That looks like an ambiguity requiring use of a direction indicator, but it's not. According to PEP 263 all Python programs implicitly start in ASCII (otherwise the optional coding cookie cannot be parsed, and presumably not the optional shebang, either). So since the Python programmer (whether natively English-speaking or Arabic-speaking) starts in state "LTR", she reads the "A" first, not the "1", and there are no problems. Of course, you want to be able to express the identifier that would be spelled out (and represented in memory!) as "1 2 3 _ A B C", and you can: 321_ABC Since I'm not an Arabic-speaker at all, I can only say I suspect that Arabic speakers will learn to do this context initialization very quickly, and to read comments marked at the *end* of the line, rather than the beginning. Ie, to an Arabic speaker an Arabic header comment will feel like this: This is the Foomatic program. # It makes passes at compilers. # It is licentiously speaking a GPL program. # A smart editor should be able to format that: This is the Foomatic program. # It makes passes at compilers. # It is licentiously speaking a GPL program. # It feels weird, but it's not that bad, to me anyway. Once again, speakers of bidi languages are in a world of pain anyway; it's reasonable to suppose that this doesn't really make things worse. There are ambiguities here, and while a naive native speaker might resolve them differently in ad hoc cases from the above, I doubt they'd be lucky enough to come up with a consistent interpretation. On the other hand, humans are *designed* to learn the arbitrary rules of languages, as children, at least. Adults who are fortunate enough to retain enough of that ability to learn to program probably will have little trouble with this particular arbitrary rule. At least, that's my guess, indirectly supported by the Unicode rules for identifiers which suggests that it is reasonable to prohibit direction indicators in identifiers. From python at zesty.ca Tue May 29 05:57:45 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Mon, 28 May 2007 22:57:45 -0500 (CDT) Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <87fy5h38rx.fsf@uwakimon.sk.tsukuba.ac.jp> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> <87veee2wj4.fsf@uwakimon.sk.tsukuba.ac.jp> <43aa6ff70705271741w2b3eefcbj29921e81822d189@mail.gmail.com> <87fy5h38rx.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, 28 May 2007, Stephen J. Turnbull wrote: > Now, identifiers are by definition character streams. If an English > speaker would pronounce the spelling of an English word "A B C", and > an Arabic speaker an Arabic word as "1 2 3", then *as an identifier* > the combination English then Arabic is spelled "A B C _ 1 2 3". And > that's all the Python compiler needs to know. In fact, on the editor > display this would be presented "ABC_321". This draft on internationalized URIs: http://www.w3.org/International/iri-edit/draft-duerst-iri.html#anchor5 points out some examples of extremely confusing display orders that can be caused by digits (which require a left-to-right ordering), slashes (which can cause digits to be interpreted as fractions), and other operators near digits. These strike me as rather awful results of the bidi algorithm. Would the display of source code be affected this way as well? -- ?!ng From stephen at xemacs.org Tue May 29 07:30:55 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 29 May 2007 14:30:55 +0900 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> <87veee2wj4.fsf@uwakimon.sk.tsukuba.ac.jp> <43aa6ff70705271741w2b3eefcbj29921e81822d189@mail.gmail.com> <87fy5h38rx.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <877iqs2o34.fsf@uwakimon.sk.tsukuba.ac.jp> Ka-Ping Yee writes: > Would the display of source code be affected this way as well? Of course! That's what PEP 3131 proponents *want*. From the draft you cite: "certain phenomena in this relationship may look strange to somebody not familiar with bidirectional behavior, but familiar to users of Arabic and Hebrew." Ie, we proponents want to allow programs that look familiar to native speakers of various languages, but do not look familiar to monolingual speakers of American English. From martin at v.loewis.de Tue May 29 07:24:14 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 29 May 2007 07:24:14 +0200 Subject: [Python-3000] Lines breaking In-Reply-To: References: Message-ID: <465BB8FE.4030604@v.loewis.de> > The change would extend the line breaking behavior to three other > ASCII characters: > NEL "Next Line" 85 > VT "Vertical Tab" 0B > FF "Form Feed" 0C Of these, NEL is not an ASCII character, so Guido's "no change for ASCII-only text" requirement doesn't apply to text containing NEL. > Of course, it is not really necessary to change, but I think full > conformance to the standard [1] could give Python better support of > multilingual texts. However, full conformance would require a good > amount of work. So, it is true that it is probably better to postpone > it until someone complaint. Can you please point to the chapter and verse where it says that VT must be considered? I only found mention of FF, in R4. Regards, Martin From martin at v.loewis.de Tue May 29 07:26:44 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 29 May 2007 07:26:44 +0200 Subject: [Python-3000] Lines breaking In-Reply-To: <465B814D.2060101@canterbury.ac.nz> References: <465B814D.2060101@canterbury.ac.nz> Message-ID: <465BB994.9050309@v.loewis.de> >> VT "Vertical Tab" 0B >> FF "Form Feed" 0C > > -1 on making these line-breaking characters by default. > I like my ASCII text file lines broken by newline chars > and nothing else. The question, of course, is what a newline char is; this whole mess originates from disagreement about this issue. For example, .splitlines considers carriage-return (CR) characters as well, and you don't seem to complain about that. Regards, Martin From guido at python.org Tue May 29 08:03:59 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 29 May 2007 14:03:59 +0800 Subject: [Python-3000] Lines breaking In-Reply-To: <465BB994.9050309@v.loewis.de> References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> Message-ID: On 5/29/07, "Martin v. L?wis" wrote: > >> VT "Vertical Tab" 0B > >> FF "Form Feed" 0C > > > > -1 on making these line-breaking characters by default. > > I like my ASCII text file lines broken by newline chars > > and nothing else. > > The question, of course, is what a newline char is; this > whole mess originates from disagreement about this issue. > > For example, .splitlines considers carriage-return (CR) > characters as well, and you don't seem to complain about > that. Well, I would have complained about that too, except I was too busy when splitlines() was snuck into the language behind my back. :-) I should add that it has never caused me grief even though it is flagrant disagreement with Python's general concept of line endings. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Tue May 29 08:12:31 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Mon, 28 May 2007 23:12:31 -0700 Subject: [Python-3000] Lines breaking In-Reply-To: References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> Message-ID: On 5/28/07, Guido van Rossum wrote: > > Well, I would have complained about that too, except I was too busy > when splitlines() was snuck into the language behind my back. :-) I Heh, just today I was wondering if we should kill splitlines: $ grep splitlines `find Lib -name '*.py'` | egrep -v '(difflib|/test/|UserString)' | wc 24 111 1653 $ egrep 'split[^l]' `find Lib -name '*.py'` | egrep -v '(difflib|/test/|UserString)' | wc 916 4943 63104 splitlines() is pretty lightly used. split() has many uses (not surprising). n From g.brandl at gmx.net Tue May 29 08:28:25 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 29 May 2007 08:28:25 +0200 Subject: [Python-3000] Lines breaking In-Reply-To: References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> Message-ID: Neal Norwitz schrieb: > On 5/28/07, Guido van Rossum wrote: >> >> Well, I would have complained about that too, except I was too busy >> when splitlines() was snuck into the language behind my back. :-) I > > Heh, just today I was wondering if we should kill splitlines: And perhaps add tuple parameters to .split()? x.split(("\r", "\n")) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From martin at v.loewis.de Tue May 29 08:59:48 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 29 May 2007 08:59:48 +0200 Subject: [Python-3000] Lines breaking In-Reply-To: References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> Message-ID: <465BCF64.5010400@v.loewis.de> > Heh, just today I was wondering if we should kill splitlines: > > $ grep splitlines `find Lib -name '*.py'` | egrep -v > '(difflib|/test/|UserString)' | wc > 24 111 1653 > $ egrep 'split[^l]' `find Lib -name '*.py'` | egrep -v > '(difflib|/test/|UserString)' | wc > 916 4943 63104 > > splitlines() is pretty lightly used. split() has many uses (not > surprising). However, I think that splitlines should work consistently with readlines (for some definition of "consistent"). Regards, Martin From stephen at xemacs.org Tue May 29 10:17:20 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 29 May 2007 17:17:20 +0900 Subject: [Python-3000] Lines breaking In-Reply-To: <465BB8FE.4030604@v.loewis.de> References: <465BB8FE.4030604@v.loewis.de> Message-ID: <874plw2gdr.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > Alexandre Vassalotti writes: > > The change would extend the line breaking behavior to three other > > ASCII characters: > > NEL "Next Line" 85 > > VT "Vertical Tab" 0B > > FF "Form Feed" 0C > > Of course, it is not really necessary to change, but I think full > > conformance to the standard [1] could give Python better support of > > multilingual texts. However, full conformance would require a good > > amount of work. I don't understand why full conformance would require much work, not for the language. Unicode does not propose to place requirements on the syntax of Python *including the repertoire of characters allowed*, only that where a character does occur, it must have the semantics defined in UAX#14. (Of course text processing modules in the stdlib will have some work to do!) I see no reason in UAX#14 that the Python grammar cannot ignore or prohibit VT and NEL (see below), prohibit use of LINE SEPARATOR and PARAGRAPH SEPARATOR, and restrict FORM FEED to occur immediately after a line break. (All outside of strings, of course, where there would be no restriction. Restrictions *must* apply to comment content, however.) Note that given Python's semantics for lines, the algorithm in Unicode (v4.1, Section 5.8, R1) for remapping to unambiguous use of LS and PS is well-defined and will leave zero residual ambiguity in a legal Python program (and no instances of PS). With the provisions above, you'll get the same display of a legal Python program as ever when you switch to a UAX#14-conforming text editor, except that it may provide a more friendly display for strings containing very long lines. People who wish to edit Python programs in Microsoft Word should preprocess with the R1 algorithm. > Can you please point to the chapter and verse where it says that VT > must be considered? I only found mention of FF, in R4. In UAX#14, revision 19, in the descriptions of classes it says: ------------------------------------------------------------------------ BK: Mandatory Break (A) (Non-tailorable) Explicit breaks act independently of the surrounding characters. No characters can be added to the BK class as part of tailoring, but implementations are not required to support the VT character. 000C FORM FEED (FF) 000B LINE TABULATION (VT) FORM FEED separates pages. The text on the new page starts at the beginning of the line. No paragraph formatting is applied. 2028 LINE SEPARATOR (LS) The text after the Line Separator starts at the beginning of the line. No paragraph formatting is applied. This is similar to HTML
. 2029 PARAGRAPH SEPARATOR (PS) The text of the new paragraph starts at the beginning of the line. Paragraph formatting is applied. Newline Function (NLF) Newline Functions are defined in the Unicode Standard as providing additional explicit breaks. They are not individual characters, but are encoded as sequences of the control characters NEL, LF, and CR. ------------------------------------------------------------------------ In the descriptions of the singleton classes LF, CR, and NL (containing NEL), it is indicated that supporting LF and CR is mandatory, the rules are the ones used by Python's universal newline feature AFAICT. And NL need not be supported: ------------------------------------------------------------------------ NL: Next Line (A) (Non-tailorable) 0085 NEXT LINE (NEL) The NL class acts like BK in all respects (there is a mandatory break after any NEL character). It cannot be tailored, but implementations are not required to support the NEL character; see the discussion under BK. ------------------------------------------------------------------------ From guido at python.org Tue May 29 10:20:08 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 29 May 2007 16:20:08 +0800 Subject: [Python-3000] Lines breaking In-Reply-To: References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> Message-ID: What would that do? On 5/29/07, Georg Brandl wrote: > Neal Norwitz schrieb: > > On 5/28/07, Guido van Rossum wrote: > >> > >> Well, I would have complained about that too, except I was too busy > >> when splitlines() was snuck into the language behind my back. :-) I > > > > Heh, just today I was wondering if we should kill splitlines: > > And perhaps add tuple parameters to .split()? > > x.split(("\r", "\n")) > > Georg > > -- > Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. > Four shall be the number of spaces thou shalt indent, and the number of thy > indenting shall be four. Eight shalt thou not indent, nor either indent thou > two, excepting that thou then proceed to four. Tabs are right out. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From krstic at solarsail.hcs.harvard.edu Tue May 29 10:26:15 2007 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?B?SXZhbiBLcnN0acSH?=) Date: Tue, 29 May 2007 04:26:15 -0400 Subject: [Python-3000] Lines breaking In-Reply-To: References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> Message-ID: <465BE3A7.3050507@solarsail.hcs.harvard.edu> Guido van Rossum wrote: > What would that do? It would split on all separators in the tuple, so x.split(("\r", "\n")) would do the same thing that x.splitlines() does now. -- Ivan Krsti? | GPG: 0x147C722D From python at zesty.ca Tue May 29 10:36:18 2007 From: python at zesty.ca (Ka-Ping Yee) Date: Tue, 29 May 2007 03:36:18 -0500 (CDT) Subject: [Python-3000] Lines breaking In-Reply-To: <465BE3A7.3050507@solarsail.hcs.harvard.edu> References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> <465BE3A7.3050507@solarsail.hcs.harvard.edu> Message-ID: On Tue, 29 May 2007, [UTF-8] Ivan Krsti?^G wrote: > Guido van Rossum wrote: > > What would that do? > > It would split on all separators in the tuple, so > > x.split(("\r", "\n")) > > would do the same thing that x.splitlines() does now. Hmm... would it? Or should two split points with nothing between them produce empty strings, i.e. you would have to do x.split(('\r\n', '\r', '\n')) to get the behaviour of x.splitlines()? -- ?!ng From krstic at solarsail.hcs.harvard.edu Tue May 29 10:38:55 2007 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?B?SXZhbiBLcnN0acSH?=) Date: Tue, 29 May 2007 04:38:55 -0400 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <877iqs2o34.fsf@uwakimon.sk.tsukuba.ac.jp> References: <19dd68ba0705120817k61788659n83da8d2c09dba0e1@mail.gmail.com> <87sl9o5dvi.fsf@uwakimon.sk.tsukuba.ac.jp> <87646i5td6.fsf@uwakimon.sk.tsukuba.ac.jp> <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> <87veee2wj4.fsf@uwakimon.sk.tsukuba.ac.jp> <43aa6ff70705271741w2b3eefcbj29921e81822d189@mail.gmail.com> <87fy5h38rx.fsf@uwakimon.sk.tsukuba.ac.jp> <877iqs2o34.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <465BE69F.1050804@solarsail.hcs.harvard.edu> Stephen J. Turnbull wrote: > Ie, we proponents want to allow programs > that look familiar to native speakers of various languages, but do not > look familiar to monolingual speakers of American English. That characterization is overly narrow. I speak and write at least three languages including English non-natively, and unexpected bidi behavior still looks unfamiliar and confusing to me. I haven't had time to participate in this discussion though I've been following it; FWIW, I'm a loud -1 on Unicode identifiers by default for just about the exact reasons that Ping enumerated. -- Ivan Krsti? | GPG: 0x147C722D From krstic at solarsail.hcs.harvard.edu Tue May 29 11:47:04 2007 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?B?SXZhbiBLcnN0acSH?=) Date: Tue, 29 May 2007 05:47:04 -0400 Subject: [Python-3000] Lines breaking In-Reply-To: References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> <465BE3A7.3050507@solarsail.hcs.harvard.edu> Message-ID: <465BF698.6050703@solarsail.hcs.harvard.edu> Ka-Ping Yee wrote: > Hmm... would it? Or should two split points with nothing between > them produce empty strings, i.e. you would have to do > x.split(('\r\n', '\r', '\n')) > to get the behaviour of x.splitlines()? Right, Georg's example would be unintuitive given the current behavior of str.split which will happily provide zero-width matches when it hits separators in sequence. Perl bypasses the issue by having split (http://perldoc.perl.org/functions/split.html) take a regex; I've only rarely used this for complex matches, though. I tried a Google code search for lang:perl split\(?\s?\/\[ (simple multiple separators) lang:python \.splitlines\s?\( lang:python \.split\s?\( but the number of results seems to oscillate between 300 and 100000, so that didn't help much. -- Ivan Krsti? | GPG: 0x147C722D From g.brandl at gmx.net Tue May 29 12:51:59 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 29 May 2007 12:51:59 +0200 Subject: [Python-3000] Lines breaking In-Reply-To: References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> <465BE3A7.3050507@solarsail.hcs.harvard.edu> Message-ID: Ka-Ping Yee schrieb: > On Tue, 29 May 2007, [UTF-8] Ivan Krsti?^G wrote: >> Guido van Rossum wrote: >> > What would that do? >> >> It would split on all separators in the tuple, so Exactly, just like .startswith() with a tuple tries all of the elements. >> x.split(("\r", "\n")) >> >> would do the same thing that x.splitlines() does now. > > Hmm... would it? Or should two split points with nothing between > them produce empty strings, i.e. you would have to do > > x.split(('\r\n', '\r', '\n')) > > to get the behaviour of x.splitlines()? Yes, that would be the correct analogon. Sorry, I should have made that clear. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From alexandre at peadrop.com Tue May 29 17:56:27 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 29 May 2007 11:56:27 -0400 Subject: [Python-3000] Lines breaking In-Reply-To: <465BB8FE.4030604@v.loewis.de> References: <465BB8FE.4030604@v.loewis.de> Message-ID: On 5/29/07, "Martin v. L?wis" wrote: > > The change would extend the line breaking behavior to three other > > ASCII characters: > > NEL "Next Line" 85 > > VT "Vertical Tab" 0B > > FF "Form Feed" 0C > > Of these, NEL is not an ASCII character, so Guido's "no change > for ASCII-only text" requirement doesn't apply to text containing > NEL. Right. It is defined in the ISO control function standard (ISO 6429). I have been duped by the format of table 5-1 in the Unicode standard. > > Of course, it is not really necessary to change, but I think full > > conformance to the standard [1] could give Python better support of > > multilingual texts. However, full conformance would require a good > > amount of work. So, it is true that it is probably better to postpone > > it until someone complaint. > > Can you please point to the chapter and verse where it says that VT > must be considered? I only found mention of FF, in R4. > Right again. (It is not my day today...) I should had read more throughly, instead relying on the table. Here the two sections for readline and writeline: R4 A readline function should stop at NLF, LS, FF, or PS. In the typical implementation, it does not include the NLF, LS, PS, or FF that caused it to stop. R4a A writeline (or newline) function should convert NLF, LS, and PS according to the conventions just discussed in "Converting to Other Character Code Sets." -- Alexandre From aahz at pythoncraft.com Tue May 29 19:08:44 2007 From: aahz at pythoncraft.com (Aahz) Date: Tue, 29 May 2007 10:08:44 -0700 Subject: [Python-3000] Lines breaking In-Reply-To: <465BF698.6050703@solarsail.hcs.harvard.edu> References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> <465BE3A7.3050507@solarsail.hcs.harvard.edu> <465BF698.6050703@solarsail.hcs.harvard.edu> Message-ID: <20070529170843.GA7598@panix.com> On Tue, May 29, 2007, Ivan Krsti?? wrote: > > Perl bypasses the issue by having split > (http://perldoc.perl.org/functions/split.html) take a regex; I've only > rarely used this for complex matches, though. Then perhaps we should just point people at re.split()... -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet From aahz at pythoncraft.com Tue May 29 19:28:36 2007 From: aahz at pythoncraft.com (Aahz) Date: Tue, 29 May 2007 10:28:36 -0700 Subject: [Python-3000] Support for PEP 3131 In-Reply-To: <465BE69F.1050804@solarsail.hcs.harvard.edu> References: <87r6p540n4.fsf@uwakimon.sk.tsukuba.ac.jp> <87646g3u9q.fsf@uwakimon.sk.tsukuba.ac.jp> <87veee2wj4.fsf@uwakimon.sk.tsukuba.ac.jp> <43aa6ff70705271741w2b3eefcbj29921e81822d189@mail.gmail.com> <87fy5h38rx.fsf@uwakimon.sk.tsukuba.ac.jp> <877iqs2o34.fsf@uwakimon.sk.tsukuba.ac.jp> <465BE69F.1050804@solarsail.hcs.harvard.edu> Message-ID: <20070529172836.GB7598@panix.com> On Tue, May 29, 2007, Ivan Krsti?? wrote: > > I haven't had time to participate in this discussion though I've been > following it; FWIW, I'm a loud -1 on Unicode identifiers by default for > just about the exact reasons that Ping enumerated. Considering that OLPC is given as an argument in favor of Unicode identifiers, I think Ivan's vote should be given extra weight. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet From alexandre at peadrop.com Tue May 29 19:29:52 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 29 May 2007 13:29:52 -0400 Subject: [Python-3000] Lines breaking In-Reply-To: References: <465BB8FE.4030604@v.loewis.de> Message-ID: I just thought about something. Would making readline(s) not glob the line breaking character be a too radical idea? I think that is what most people are expecting from a readline function, anyway. I often see things like [line.strip() for line in open(file).readlines()], which is not so elegant IMHO. This should be accompanied with a change to writelines that would make it appends to each line the platform-specific line breaking character, as defined by os.linesep. The main objections I would against the change is obviously breaking backward-compatibility, and losing the closure property of readlines/writelines -- i.e., after g.writelines(f.readlines()), g wouldn't have the guarantee to have the same content of f. On the other hand, this could give Python a neat way to convert line breaking characters. Anyway, that was just a random thought. I don't think the change is worthwhile enough, to break backward-compatibility. -- Alexandre From greg.ewing at canterbury.ac.nz Wed May 30 03:15:02 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 May 2007 13:15:02 +1200 Subject: [Python-3000] Lines breaking In-Reply-To: <465BB994.9050309@v.loewis.de> References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> Message-ID: <465CD016.7050002@canterbury.ac.nz> Martin v. L?wis wrote: > For example, .splitlines considers carriage-return (CR) > characters as well, and you don't seem to complain about > that. That doesn't bother me so much because \r as a line boundary is a well-established convention on some platforms. But I've *never* heard of FF or VT being used as line delimiters. If they were, I would regard it as an application-specific convention requiring special coding for that application. -- Greg From greg.ewing at canterbury.ac.nz Wed May 30 03:17:40 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 May 2007 13:17:40 +1200 Subject: [Python-3000] Lines breaking In-Reply-To: References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> Message-ID: <465CD0B4.9000204@canterbury.ac.nz> Guido van Rossum wrote: > Well, I would have complained about that too, except I was too busy > when splitlines() was snuck into the language behind my back. :-) I > should add that it has never caused me grief even though it is > flagrant disagreement with Python's general concept of line endings. Personally I wouldn't object if you reverted that and only allowed "\n" in splitlines. Having one and only one internal representation for line endings seems like a good thing. -- Greg From greg.ewing at canterbury.ac.nz Wed May 30 03:42:22 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 May 2007 13:42:22 +1200 Subject: [Python-3000] Lines breaking In-Reply-To: References: <465BB8FE.4030604@v.loewis.de> Message-ID: <465CD67E.7080902@canterbury.ac.nz> Alexandre Vassalotti wrote: > I often > see things like [line.strip() for line in open(file).readlines()], If readline() stripped newlines, there would be no way to distinguish between an empty line and EOF. -- Greg From stephen at xemacs.org Wed May 30 05:19:29 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 30 May 2007 12:19:29 +0900 Subject: [Python-3000] Lines breaking In-Reply-To: <465CD016.7050002@canterbury.ac.nz> References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> <465CD016.7050002@canterbury.ac.nz> Message-ID: <87ps4j0zi6.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > That doesn't bother me so much because \r as a line boundary is a > well-established convention on some platforms. But I've *never* > heard of FF or VT being used as line delimiters. The Unicode newline recommendation is all about making the use of characters match their physical presentation. If on a printer, you force a new page with FF, you will see a physical line break at the end of the page containing the FF. Similarly with VT. (It seems that word processors which interpret LF as a paragraph separator often use VT as a hard newline.) The input functions should obey Unicode's recommendations, IMHO. OTOH, AIUI Unicode conformance does not require the Python language (grammar) to allow line breaking characters other than those currently recognized. And the grammar may restrict their use (eg, FF only at the end of an empty line). From greg.ewing at canterbury.ac.nz Thu May 31 04:49:39 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2007 14:49:39 +1200 Subject: [Python-3000] Lines breaking In-Reply-To: <87ps4j0zi6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> <465CD016.7050002@canterbury.ac.nz> <87ps4j0zi6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <465E37C3.9070407@canterbury.ac.nz> Stephen J. Turnbull wrote: > The Unicode newline recommendation is all about making the use of > characters match their physical presentation. If on a printer, you > force a new page with FF, you will see a physical line break at the > end of the page containing the FF. Similarly with VT. I'm worried here about loss of information. Currently, a Python-recognised line break character signifies a line break and nothing else. You can read a file as lines, strip off the newlines, do some processing, and add the newlines back in when writing out the results, without losing anything essential. But an FF or VT is not *just* a line break, it can have other semantics attatched to it as well. So treating it just the same as a \n by default would be wrong, I think. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From talin at acm.org Thu May 31 08:37:07 2007 From: talin at acm.org (Talin) Date: Wed, 30 May 2007 23:37:07 -0700 Subject: [Python-3000] Updating PEP 3101 Message-ID: <465E6D13.2030606@acm.org> I'm in the process of updating PEP 3101 to incorporate all of the various discussions and evolutions that have taken place, and this is turning out to be fairly involved, as there are a lot of ideas scattered all over the place. One thing I'd like to do is simplify the PEP a little bit, but at the same time address some of the requests that folks have asked for. The goal here is to keep the basic "string.format" interface as simple as possible, but at the same time to allow access to more complex formatting for people who need it. My assumption is that people who need that more complex formatting would be willing to give up some of the syntactical convenience of the simple "string.format" style of formatting. So for example, one thing that has been asked for is the ability to pass in a whole dictionary as a single argument, without using **kwds-style keyword parameter expansion (which is inefficient if the dictionary is large and only a few entries are being referred to in the format string.) The most recent proposals have this implemented by a special 'namespace' argument to the format function. However, I don't like the idea of having certain arguments with 'special' names. Instead, what I'd like to do is define a "Formatter" class that takes a larger number of options and parameters than the normal string.format method. People who need the extra power can construct an instance of Formatter (or subclass it if needed) and use that. So for example, for people who want to be able to directly access local variables in a format string, you might be able to say something like: a = 1 print(Formatter(locals()).format("The value of a is {a}")) Where the "Formatter" constructor looks like: Formatter(namespace={}, flags=None) In the case where you want direct access to global variables, you can make it even more convenient by caching the Formatter: f = Formatter(globals()).format a = 1 print(f("The value of a is {a}")) (You can't do this with locals() because you can't keep the dict around.) My question to the groupmind out there is: Do you find this extra syntax too inconvenient and wordy, or does it seem acceptable? -- Talin From stephen at xemacs.org Thu May 31 09:22:41 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 31 May 2007 16:22:41 +0900 Subject: [Python-3000] Lines breaking In-Reply-To: <465E37C3.9070407@canterbury.ac.nz> References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> <465CD016.7050002@canterbury.ac.nz> <87ps4j0zi6.fsf@uwakimon.sk.tsukuba.ac.jp> <465E37C3.9070407@canterbury.ac.nz> Message-ID: <8764691mpq.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > But an FF or VT is not *just* a line break, it can > have other semantics attatched to it as well. So > treating it just the same as a \n by default would be > wrong, I think. *Python* does the right thing: it leaves the line break character(s) in place. It's not Python's problem if programmers go around stripping characters just because they happen to be at the end of the line. If you do care, you're already in trouble if you strip willy-nilly: >>> len("a\014\n") 3 >>> len("a\014\n".strip()) 1 >>> len("a\014\n".strip() + "\n") 2 >>> "a\r\n"[:-1] "a\r" I think the odds are really good that there are already more people who will expect Python to be Unicode-ly correct than who have already-defined semantics for FF or VT that just happen to work right if you strip the terminating LF but not a terminating FF. The remaining issue, embedding those characters in the interior of lines but considering them not line breaks, is considered by the Unicode technical committee a non-issue. Those characters are mandatory breaks because the expectation is *very* consistent (they say). I gather you think it's reasonable, too, you just worry that the additional semantics may get lost with current newline-stripping heuristics. As far as existing programs that will go postal if you hand them a line that's terminated with FF or VT, I don't see any conceptual problem with a codec (universal newline) that on input of "a\014" returns "a\014\n". Getting the details right (ie, respecting POLA) will require some thought and maybe some fiddly options, but it will work. Always-do-right-it-will-gratify-some-people-and-astonish-the-rest-ly y'rs From guido at python.org Thu May 31 13:48:48 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 31 May 2007 19:48:48 +0800 Subject: [Python-3000] [Python-Dev] PEP 367: New Super In-Reply-To: References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <001d01c79f15$f0afa140$0201a8c0@mshome.net> <002d01c79f6d$ce090de0$0201a8c0@mshome.net> <003f01c79fd9$66948ec0$0201a8c0@mshome.net> <009c01c7a04f$7e348460$0201a8c0@mshome.net> Message-ID: I've updated the patch; the latest version now contains the grammar and compiler changes needed to make super a keyword and to automatically add a required parameter 'super' when super is used. This requires the latest p3yk branch (r55692 or higher). Comments anyone? What do people think of the change of semantics for the im_class field of bound (and unbound) methods? --Guido On 5/29/07, Guido van Rossum wrote: > Hi Tim, > > I've gone ahead and cooked up a tiny demo patch that uses im_class to > store what you called im_type. Because I don't have the parser changes > ready yet, this requires you to declare a keyword-only arg named > 'super'; this triggers special code that set it to super(im_class, > im_self). > > http://python.org/sf/1727209 > > I haven't tried to discover yet how much breaks due to the change of > semantics for im_class. > > --Guido > > On 5/27/07, Guido van Rossum wrote: > > On 5/27/07, Tim Delaney wrote: > > > Guido van Rossum wrote: > > > > > > > The bound method object isn't stored in the class -- it's created by > > > > the "C.method" or "inst.method" getattr operation. I don't see how > > > > this would introduce a cycle. > > > > > > > >> If we store the class, we can store it as a weakref - the when the > > > >> super object is created, a strong reference to the class exists. > > > > > > We need to create some relationship between the unbound method and the > > > class. So the class has a reference to the unbound method, and the unbound > > > method has a reference to the class, thus creating a cycle. Bound methods > > > don't come into it - it's the unbound method that's the problem. > > > > Still wrong, I think. The unbound method object *also* isn't stored in > > the class. It's returned by the C.method operation. Compare C.method > > (which returns an unbound method) to C.__dict__['method'] (which > > returns the actual function object stored in the class). > > > > > > Since class and type are synonym (as you say) having both im_class and > > > > im_type would be a bad idea. > > > > > > I'm struggling to think of another, not too complicated name that conveys > > > the same information. > > > > Keep trying. im_type is not acceptable. :-) > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at gmail.com Thu May 31 13:52:21 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 31 May 2007 21:52:21 +1000 Subject: [Python-3000] Updating PEP 3101 In-Reply-To: <465E6D13.2030606@acm.org> References: <465E6D13.2030606@acm.org> Message-ID: <465EB6F5.5000308@gmail.com> Talin wrote: > In the case where you want direct access to global variables, you can > make it even more convenient by caching the Formatter: > > f = Formatter(globals()).format > a = 1 > print(f("The value of a is {a}")) > > (You can't do this with locals() because you can't keep the dict around.) > > My question to the groupmind out there is: Do you find this extra syntax > too inconvenient and wordy, or does it seem acceptable? I like it - even with locals, it works well for multi-line output: fmt = Formatter(locals()).format print(fmt('Count: {count}')) print(fmt('Total: {total}')) print(fmt('Average: {avg}')) (Hmm, the extra parentheses on print statements are annoying me already... but I imagine I will get over it :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From timothy.c.delaney at gmail.com Thu May 31 14:25:28 2007 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Thu, 31 May 2007 22:25:28 +1000 Subject: [Python-3000] [Python-Dev] PEP 367: New Super References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <001d01c79f15$f0afa140$0201a8c0@mshome.net> <002d01c79f6d$ce090de0$0201a8c0@mshome.net> <003f01c79fd9$66948ec0$0201a8c0@mshome.net> <009c01c7a04f$7e348460$0201a8c0@mshome.net> Message-ID: <016201c7a37e$c941adc0$0201a8c0@mshome.net> Guido van Rossum wrote: > I've updated the patch; the latest version now contains the grammar > and compiler changes needed to make super a keyword and to > automatically add a required parameter 'super' when super is used. > This requires the latest p3yk branch (r55692 or higher). > > Comments anyone? What do people think of the change of semantics for > the im_class field of bound (and unbound) methods? I had problems getting the p3yK branch that I only resolved yesterday so I haven't actually applied the patch here yet. Turns out I'd grabbed the wrong URL for the repository at some point, and couldn't work out why I kept getting prop not found errors when trying to check out. If I understand correctly, the patch basically takes im_class back to Python 2.1 semantics, which I always felt were much more useful than the 2.2 semantics. As a bonus, it should mean that the repr of a bound or unbound method should reflect the class it was defined in. Is this correct? The patch notes say that you're actually inserting a keyword-only argument - is this purely meant to be a stopgap measure so that you've got a local (which could be put into a cell)? Presumably with this approach you could call the method like: A().func(1, 2, super=object()) The final implementation IMO needs to have super be an implicit local, but not an argument. BTW, what made you change your mind on re-using im_class? Previously you'd said you didn't want to (although now I can't find the email to back that up). I'd written off reusing it for this purpose because of that. I won't be able to update the PEP until Sunday (visiting family) but I'll try to incorporate everything we've discussed. Did we get a decision on whether im_class should return the decorated or undecorated class, or did you want me to leave that as an open issue? I'm starting to feel somewhat embarrassed that I haven't had the time available to work solidly on this, but don't let that stop you from doing it - I'd rather have a good implementation early and not let my ego get in the way . Cheers, Tim Delaney From guido at python.org Thu May 31 15:08:16 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 31 May 2007 21:08:16 +0800 Subject: [Python-3000] [Python-Dev] PEP 367: New Super In-Reply-To: <016201c7a37e$c941adc0$0201a8c0@mshome.net> References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <002d01c79f6d$ce090de0$0201a8c0@mshome.net> <003f01c79fd9$66948ec0$0201a8c0@mshome.net> <009c01c7a04f$7e348460$0201a8c0@mshome.net> <016201c7a37e$c941adc0$0201a8c0@mshome.net> Message-ID: On 5/31/07, Tim Delaney wrote: > Guido van Rossum wrote: > > I've updated the patch; the latest version now contains the grammar > > and compiler changes needed to make super a keyword and to > > automatically add a required parameter 'super' when super is used. > > This requires the latest p3yk branch (r55692 or higher). > > > > Comments anyone? What do people think of the change of semantics for > > the im_class field of bound (and unbound) methods? > > I had problems getting the p3yK branch that I only resolved yesterday so I > haven't actually applied the patch here yet. Turns out I'd grabbed the wrong > URL for the repository at some point, and couldn't work out why I kept > getting prop not found errors when trying to check out. svn definitely has some sharp edges when you specify a bad URL. > If I understand correctly, the patch basically takes im_class back to Python > 2.1 semantics, which I always felt were much more useful than the 2.2 > semantics. As a bonus, it should mean that the repr of a bound or unbound > method should reflect the class it was defined in. Is this correct? Right. (I think that's the main cause of various test failures, which I haven't corrected yet.) > The patch notes say that you're actually inserting a keyword-only argument - > is this purely meant to be a stopgap measure so that you've got a local > (which could be put into a cell)? I'm not using a cell because I'm storing the result of calling super(Class, self) -- that is different for each instance, while a cell would be shared by all invocations of the same function. > Presumably with this approach you could > call the method like: > > A().func(1, 2, super=object()) No, because that would be a syntax error (super as a keyword is only allowed as an atom). You could get the same effect with A().func(1, 2, **{'super': object()}) but that's so obscure I don't mind. Hmm, right now the super=object() syntax *is* accepted, but that's a bug in the code (which I submitted yesterday) that checks for assignments to keywords like None, True, False, and now super. > The final implementation IMO needs to have super be an implicit local, but > not an argument. I thought so to at first, but there are no APIs to pass the value along from the point where the super object is created (in the method_call() function) to the point where the frame exists into which the object needs to be stored (in PyEval_EvalCodeEx). So I think a hidden keyword argument is quite convenient. > BTW, what made you change your mind on re-using im_class? Previously you'd > said you didn't want to (although now I can't find the email to back that > up). I'd written off reusing it for this purpose because of that. I do recall not liking that, but ended up thinking some more about it after I realized how much work it would be to add another member to the method struct. When I tried it and saw that only 7 unit test modules had failures (and mostly only a few out of many tests) I decided it was worth trying. > I won't be able to update the PEP until Sunday (visiting family) but I'll > try to incorporate everything we've discussed. Did we get a decision on > whether im_class should return the decorated or undecorated class, or did > you want me to leave that as an open issue? In my implementation, it will return whatever object is found in the MRO of the derived class, because that's all that's available -- I suppose this means in practice it's the decorated class. BTW I'm open to a different implementation that stores the class in a cell and moves the computation of super(Class, self) into the function body -- but that would be completely different from the current version, as the changes to im_class and method_call would not be useful in that case. Instead, someting would have to be done with that cell at class definition time. I fear that it would be much more complicated to produce that version -- I spent a *lot* of time trying to understand how symtable.c and compile.c work in order to be able to add the implied super argument. That code is really difficult to follow, it uses a different style than most of the rest of Python (perhaps because I didn't write it :-), and it is quite subtle. For example, if a nested function inside a method uses super, this currently doesn't reference the super of the method -- it adds super to the nested function's parameter lists, and this makes it effectively uncallable. > I'm starting to feel somewhat embarrassed that I haven't had the time > available to work solidly on this, but don't let that stop you from doing > it - I'd rather have a good implementation early and not let my ego get in > the way . Thanks. I realize I sort of took over and was hoping you'd respond like this. I may not have much time over the weekend (recovering from an exhausting and mind-bending trip to Beijing) so you're welcome to catch up! > Cheers, > > Tim Delaney -- --Guido van Rossum (home page: http://www.python.org/~guido/) From timothy.c.delaney at gmail.com Thu May 31 15:25:17 2007 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Thu, 31 May 2007 23:25:17 +1000 Subject: [Python-3000] [Python-Dev] PEP 367: New Super References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <002d01c79f6d$ce090de0$0201a8c0@mshome.net> <003f01c79fd9$66948ec0$0201a8c0@mshome.net> <009c01c7a04f$7e348460$0201a8c0@mshome.net> <016201c7a37e$c941adc0$0201a8c0@mshome.net> Message-ID: <018101c7a387$23f21a90$0201a8c0@mshome.net> Guido van Rossum wrote: >> The patch notes say that you're actually inserting a keyword-only >> argument - is this purely meant to be a stopgap measure so that >> you've got a local (which could be put into a cell)? > > I'm not using a cell because I'm storing the result of calling > super(Class, self) -- that is different for each instance, while a > cell would be shared by all invocations of the same function. I'm actually investigating another (possibly complementary) option at the moment - adding an im_super attribute to methods, which would store either a bound or unbound super instance when the bound or unbound method object is created. method_new becomes: static PyObject * method_new(PyTypeObject* type, PyObject* args, PyObject *kw) { PyObject *func; PyObject *self; PyObject *classObj = NULL; if (!_PyArg_NoKeywords("instancemethod", kw)) return NULL; if (!PyArg_UnpackTuple(args, "method", 2, 3, &func, &self, &classObj)) return NULL; if (!PyCallable_Check(func)) { PyErr_SetString(PyExc_TypeError, "first argument must be callable"); return NULL; } if (self == Py_None) self = NULL; if (self == NULL && classObj == NULL) { PyErr_SetString(PyExc_TypeError, "unbound methods must have non-NULL im_class"); return NULL; } return PyMethod_New(func, self, classObj); } then in method_call we could have: static PyObject * method_call(PyObject *func, PyObject *arg, PyObject *kw) { PyObject *self = PyMethod_GET_SELF(func); PyObject *klass = PyMethod_GET_CLASS(func); PyObject *supervalue = PyMethod_GET_SUPER(func); and populate the `super` argument from supervalue. I think im_super has uses on its own (esp. for introspection). >> Presumably with this approach you could >> call the method like: >> >> A().func(1, 2, super=object()) > > No, because that would be a syntax error (super as a keyword is only > allowed as an atom). You could get the same effect with > > A().func(1, 2, **{'super': object()}) > > but that's so obscure I don't mind. I'd prefer to eliminate it, but that's a detail that can be taken care of later. Anyway, need to go to bed - have to be up in 6 hours. Cheers, Tim Delaney From janssen at parc.com Thu May 31 16:49:53 2007 From: janssen at parc.com (Bill Janssen) Date: Thu, 31 May 2007 07:49:53 PDT Subject: [Python-3000] Lines breaking In-Reply-To: <465E37C3.9070407@canterbury.ac.nz> References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> <465CD016.7050002@canterbury.ac.nz> <87ps4j0zi6.fsf@uwakimon.sk.tsukuba.ac.jp> <465E37C3.9070407@canterbury.ac.nz> Message-ID: <07May31.074957pdt."57996"@synergy1.parc.xerox.com> > But an FF or VT is not *just* a line break, it can > have other semantics attatched to it as well. So > treating it just the same as a \n by default would be > wrong, I think. I agree. I have text files which contain lines of FF NL, which are supposed to be single lines with a FF as their content (to signify a page break), not two separate lines. Bill From pje at telecommunity.com Thu May 31 19:08:40 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Thu, 31 May 2007 13:08:40 -0400 Subject: [Python-3000] [Python-Dev] PEP 367: New Super In-Reply-To: References: <001101c79aa7$eb26c130$0201a8c0@mshome.net> <001d01c79f15$f0afa140$0201a8c0@mshome.net> <002d01c79f6d$ce090de0$0201a8c0@mshome.net> <003f01c79fd9$66948ec0$0201a8c0@mshome.net> <009c01c7a04f$7e348460$0201a8c0@mshome.net> Message-ID: <20070531170734.273393A40AA@sparrow.telecommunity.com> At 07:48 PM 5/31/2007 +0800, Guido van Rossum wrote: >I've updated the patch; the latest version now contains the grammar >and compiler changes needed to make super a keyword and to >automatically add a required parameter 'super' when super is used. >This requires the latest p3yk branch (r55692 or higher). > >Comments anyone? What do people think of the change of semantics for >the im_class field of bound (and unbound) methods? Please correct me if I'm wrong, but just looking at the patch it seems to me that the descriptor protocol is being changed as well -- i.e., the 'type' argument is now the found-in-type in the case of an instance __get__ as well as class __get__. It would seem to me that this change would break classmethods both on the instance and class level, since the 'cls' argument is supposed to be the derived class, not the class where the method was defined. There also don't seem to be any tests for the use of super in classmethods. This would seem to make the change unworkable, unless we are also getting rid of classmethods, or further change the descriptor protocol to add another argument. However, by the time we get to that point, it seems like making 'super' a cell variable might be a better option. Here's a strategy that I think could resolve your difficulties with the cell variable approach: First, when a class is encountered during the symbol setup pass, allocate an extra symbol for the class as a cell variable with a generated name (e.g. $1, $2, etc.), and keep a pointer to this name in the class state information. Second, when generating code for 'super', pull out the generated variable name of the nearest enclosing class, and use it as if it had been written in the code. Third, change the MAKE_FUNCTION for the BUILD_CLASS to a MAKE_CLOSURE, and add code after BUILD_CLASS to also store a super object in the special variable. Maybe something like: ... BUILD_CLASS ... apply decorators ... DUP_TOP STORE_* classname ... generate super object ... STORE_DEREF $n Fourth, make sure that the frame initialization code can deal with a code object that has a locals dictionary *and* cell variables. For Python 2.5, this constraint is already met as long as CO_OPTIMIZED isn't set, and that should already be true for the relevant cases (module-level code and class bodies), so we really just need to ensure that CO_OPTIMIZED doesn't get set as a side-effect of adding cell variables. From g.brandl at gmx.net Thu May 31 19:34:16 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 31 May 2007 19:34:16 +0200 Subject: [Python-3000] __debug__ Message-ID: Guido just fixed a case in the py3k branch where you could assign to "None" in a function call. __debug__ has similar problems: it can't be assigned to normally, but via keyword arguments it is possible. This should be fixed; or should __debug__ be thrown out anyway? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From stephen at xemacs.org Thu May 31 19:50:28 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 01 Jun 2007 02:50:28 +0900 Subject: [Python-3000] Lines breaking In-Reply-To: <07May31.074957pdt."57996"@synergy1.parc.xerox.com> References: <465B814D.2060101@canterbury.ac.nz> <465BB994.9050309@v.loewis.de> <465CD016.7050002@canterbury.ac.nz> <87ps4j0zi6.fsf@uwakimon.sk.tsukuba.ac.jp> <465E37C3.9070407@canterbury.ac.nz> <07May31.074957pdt."57996"@synergy1.parc.xerox.com> Message-ID: <873b1c287v.fsf@uwakimon.sk.tsukuba.ac.jp> Bill Janssen writes: > > But an FF or VT is not *just* a line break, it can > > have other semantics attatched to it as well. So > > treating it just the same as a \n by default would be > > wrong, I think. > > I agree. I have text files which contain lines of FF NL, which > are supposed to be single lines with a FF as their content (to signify > a page break), not two separate lines. I agree that that looks nice in my editor, but it is not Unicode- conforming practice, and I suspect that if you experiment with any printer you'll discover that you get an empty line at the top of the page. I also suspect that any program that currently is used to process those files' content by lines probably simply treats the FF as whitespace, and throws away empty lines. If so, it will still work with FF treated as a hard line break in line-processing mode, since the trailing NL will now generate a (superfluous) empty line. Given that, is this going to matter to you? From alexandre at peadrop.com Thu May 31 20:49:25 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Thu, 31 May 2007 14:49:25 -0400 Subject: [Python-3000] Buffer objects and StringIO Message-ID: Hello, I finished yesterday the implementations of BytesIO and StringIO objects in C. They are both fully working. (The code is available in my cpy_merge branch in the svn tree.) There is only one thing that is bothering me with StringIO, it doesn't accept buffer objects. Should I care about this? Thanks, -- Alexandre From brett at python.org Thu May 31 20:51:08 2007 From: brett at python.org (Brett Cannon) Date: Thu, 31 May 2007 11:51:08 -0700 Subject: [Python-3000] __debug__ In-Reply-To: References: Message-ID: On 5/31/07, Georg Brandl wrote: > > Guido just fixed a case in the py3k branch where you could assign to > "None" in a function call. > > __debug__ has similar problems: it can't be assigned to normally, but via > keyword arguments it is possible. > > This should be fixed; or should __debug__ be thrown out anyway? I never use the flag, personally. When I am debugging I have an app-specific flag I set. I am +1 on ditching it. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070531/60a1d71c/attachment.html From theller at ctypes.org Thu May 31 21:59:28 2007 From: theller at ctypes.org (Thomas Heller) Date: Thu, 31 May 2007 21:59:28 +0200 Subject: [Python-3000] __debug__ In-Reply-To: References: Message-ID: Brett Cannon schrieb: > On 5/31/07, Georg Brandl wrote: >> >> Guido just fixed a case in the py3k branch where you could assign to >> "None" in a function call. >> >> __debug__ has similar problems: it can't be assigned to normally, but via >> keyword arguments it is possible. >> >> This should be fixed; or should __debug__ be thrown out anyway? > > > > I never use the flag, personally. When I am debugging I have an > app-specific flag I set. I am +1 on ditching it. > > -Brett > > I would very much wish that __debug__ stays, because I use it it nearly every larger program that I later wish to freeze and distribute. "if __debug__: ..." blocks have the advantage that *no* bytecode is generated when run or frozen with -O or -OO, so the modules imported in these blocks are not pulled in by modulefinder. You cannot get this effect (AFAIK) with app-specific flags. Thanks, Thomas From nnorwitz at gmail.com Thu May 31 22:55:39 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Thu, 31 May 2007 13:55:39 -0700 Subject: [Python-3000] Buffer objects and StringIO In-Reply-To: References: Message-ID: On 5/31/07, Alexandre Vassalotti wrote: > Hello, > > I finished yesterday the implementations of BytesIO and StringIO > objects in C. They are both fully working. (The code is available in > my cpy_merge branch in the svn tree.) There is only one thing that is > bothering me with StringIO, it doesn't accept buffer objects. Should I > care about this? Yes, but buffer objects are likely to change in 3.0. See PEP 3118 http://www.python.org/dev/peps/pep-3118/ It's not accepted because the PEP isn't complete yet AFAIK. n