skip to navigation
skip to content

Help Fund Python

python-dev Summary for 2006-02-16 through 2006-02-28

[The HTML version of this Summary is available at http://www.python.org/dev/summary/2006-02-16_2006-02-28]

Announcements

Python release schedule

The Python 2.5 release schedule is PEP 356. The first releases are planned for the end of March/beginning of April. Check the PEP for the full plan of features.

Contributing threads:

[SJB]

Buildbot improvements

Thanks to Benji York and Walter Dörwald, the buildbot results page now has a new CSS stylesheet that should make it a little easier to read. (And thanks to Josiah Carlson, we should now have a Windows buildbot slave.)

Contributing threads:

[SJB]

Deprecation of multifile module

The multifile module, which has been supplanted by the email module since Python 2.2, is finally being deprecated. Though the module will not be removed in Python 2.5, its documentation now clearly indicates the deprecation.

Contributing thread:

[SJB]

Win64 AMD64 binaries available

Martin v. Löwis has made AMD64 binaries available for the current trunk's Python. If you're using an AMD64 machine (a.k.a. EM64T or x64), give 'em a whirl and see how they work.

Contributing thread:

[SJB]

Javascript to adopt Python iterators and generators

On a slightly off-topic note, Brendan Eich has blogged that the next version of Javascript will borrow iterators, generators and list comprehensions from Python. Nice to see that the Python plague is even spreading to other programming languages now. ;)

Contributing thread:

[SJB]

Summaries

A dict with a default value

Guido suggested a defaultdict type which would act like a dict, but produce a default value when __getitem__ was called and no key existed. The intent was to simplify code examples like:

# a dict of lists
for x in y:
    d.setdefault(key, []).append(value)

# a dict of counts
for x in y:
    d[key] = d.get(key, 0) + 1

where the user clearly wants to associate a single default with the dict, but has no simple way to spell this. People quickly agreed that the default should be specified as a function so that using list as a default could create a dict of lists, and using int as a default could create a dict of counts.

Then the real thread began. Guido proposed adding an on_missing method to the dict API, which would be called whenever __getitem__ found that the requested key was not present in the dict. The on_missing method would look for a default_factory attribute, and try to call it if it was set, or raise a KeyError if it was not. This would allow e.g. dd.default_factory = list to make a dict object produce empty lists as default values, and del dd.default_factory to revert the dict object to the standard behavior.

However, a number of opponents worried that confusion would arise when basic dict promises (like that x in d implies that x in d.keys() and d[x] doesn't raise a KeyError) could be conditionally overridden by the existence of a default_factory attribute. Others worried about complicating the dict API with yet another method, especially one that was never meant to be called directly (only overridden in subclasses). Eventually, Guido was convinced that instead of modifying the builtin dict type, a new collections.defaultdict should be introduced.

Guido then defended keeping on_missing as a method of the dict type, noting that without on_missing any subclasses (e.g. collections.defaultdict) that wanted to override the behavior for missing keys would have to override __getitem__ and pay the penalty on every call instead of just the ones where the key wasn't present. In the patch committed to the Python trunk, on_missing was renamed to __missing__ and though no __missing__ method is defined for the dict type, if a subclass defines it, it will be called instead of raising the usual KeyError.

Contributing threads:

[SJB]

Encode and decode interface in Python 3.0

Jason Orendorff suggested that bytes.encode() and text.decode() (where text is the name of Python 3.0's str/unicode) should be removed in Python 3.0. Guido agreed, suggesting that Python 3.0 should have one of the following APIs for encoding and decoding:

  • bytes.decode(enc) -> text text.encode(enc) -> bytes
  • text(bytes, enc) -> text bytes(text, enc) -> bytes

There was a lot of discussion about how hard it was for beginners to figure out the current .encode() and .decode() methods, and Martin v. Löwis suggested that the behavior:

py> "Martin v. Löwis".encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11: ordinal not in range(128)

would be better if replaced by Guido's suggested behavior:

py> "Martin v. Löwis".encode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'str' object has no attribute 'encode'

since the user would immediately know that they had made a mistake by trying to encode a string. However, some people felt that this problem could be solved by simply changing the UnicodeDecodeError to something more informative like ValueError: utf8 can only encode unicode objects.

M.-A. Lemburg felt strongly that text and bytes objects should keep both .encode() and .decode() methods as simple interfaces to the registered codecs. Since the codecs system handles general encodings (not just text<->bytes encodings) he felt that .encode() and .decode() should be available on both bytes and text objects and should be able to return whatever type the encoding deems appropriate. Guido repeated one of his design guidelines: the return type of a function should not depend on the value of the arguments. Thus he would prefer that bytes.decode() only return text and text.decode() only return bytes, regardless of the encodings passed in. (He didn't seem to be commenting on the architecture of the codecs module however, just the architecture of the bytes and text types.)

Contributing threads:

[SJB]

Writable closures

Almann T. Goo was considering writing a PEP to allow write access to names in nested scopes. Currently, names in nested scopes can only be read, not written, so the following code fails with an UnboundLocalError:

def getinc(start=0):
    def incrementer(inc=1):
        start += inc
        return start
    return incrementer

Almann suggested introducing a new declaration, along the lines of global, to indicate that assignments to a name should be interpreted as assignments to the name in the nearest enclosing scope. Initially, he proposed the term use for this declaration, but most of the thread participants seemed to prefer outer, allowing the function above to be written as:

def getinc(start=0):
    def incrementer(inc=1):
        outer start
        start += inc
        return start
    return incrementer

A variety of syntactic variants achieving similar results were proposed, including a way to name a function's local namespace:

def getinc(start=0):
    namespace xxx
    def incrementer(inc=1):
        xxx.start += inc
        return xxx.start
    return incrementer

a way to indicate when a single use of a name should refer to the outer scope, based on the syntax for relative imports:

def getinc(start=0):
    def incrementer(inc=1):
        .start += inc        # note the "."
        return .start        # note the "." (this one could be optional)
    return incrementer

and the previously suggested rebinding statement, from PEP 227:

def getinc(start=0):
    def incrementer(inc=1):
        start +:= inc        # note the ":=" instead of "="
        return start
    return incrementer

Much like the last time this issue was brought up, "the discussion fizzled out after having failed to reach a consensus on an obviously right way to go about it" (Greg Ewing's quite appropriate wording). No PEP was produced, and it didn't seem like one would soon be forthcoming.

Contributing threads:

[SJB]

PEP 358: The "bytes" Object

This week mostly wrapped up the bytes type discussion from the last fortnight, with the introduction of PEP 358: The "bytes" Object. The PEP proposes a bytes type which:

  • is a sequence of range(0, 256) int objects
  • can be constructed out of lists of range(0, 256) ints
  • can be constructed out of the characters of str objects
  • can be constructed out of unicode objects using a specified encoding (or the system default encoding if none is specified)
  • can be constructed out of a hex string using the classmethod bytes.fromhex

The bytes constructor allows an encoding for unicode objects (instead of requiring a call to unicode.encode) so as not to require double copying (one of encoding and one for conversion to bytes). Some people took issue with the fact that constructor allows an encoding for str objects, but ignores it, as this means code like bytes(s, 'utf-16be') will do a different thing for str and unicode. Ignoring the encoding argument for str objects was apparently intended to ease the transition from str to bytes, though it was not clear exactly how.

Contributing threads:

[SJB]

Compiling Python with MS VC++ 2005

M.-A. Lemburg suggested compiling Python with the new MS VC++ 2005, especially since it's "free". There was some concern about the stability of VS2005, and Benji York pointed out that the express editions are only free until November 6th. Fredrik Lundh pointed out that it would be substantially more work for all the developers who provide ready-made Windows binaries for multiple Python releases. In the end, they decided to keep with the current compiler at least for one more release.

Contributing thread:

[SJB]

Alternate lambda syntax

Even though Guido already declared that Python 3.0 will keep the current lambda syntax, Talin decided to try out the new AST and give lambda a face-lift. With Talin's patch, you can now write lambdas like:

>>> a = (x*x given (x))
>>> a(9)
81

>>> a = (x*y given (x=3,y=4))
>>> a(9, 10)
90
>>> a(9)
36
>>> a()
12

The patch was remarkably simple, and people were suitably impressed by the flexibility of the new AST. Of course, the patch was rejected since Guido is now happy with the current lambda situation.

Contributing thread:

[SJB]

Stateful codecs

Walter Dörwald was looking for ways to cleanly support stateful codecs. M.-A. Lemburg suggested extending the codec registry to maintain slots for the stateful encoders and decoders (and allowing six-tuples to be passed in) and adding the functions codecs.getencoderobject() and codecs.getdecoderobject(). Walter Dörwald suggested that codecs.lookup() should return objects with the following attributes:

  1. Name
  2. Encoder function
  3. Decoder function
  4. Stateful encoder factory
  5. Stateful decoder factory
  6. Stream writer factory
  7. Stream reader factory

For the sake of backwards compatibility, these objects would subclass tuple so that they look like the old four-tuples returned by codecs.lookup(). Walter's patch provides an implementation of some of these suggestions.

Contributing thread:

[SJB]

operator.is*Type and user-defined types

Michael Foord pointed out that for types written in Python, operator.isMappingType and operator.isSquenceType are essentially identical -- they both return True if __getitem__ is defined. Raymond Hettinger and Greg Ewing explained that for types written in C, these functions can give more detailed information because at the C level, CPython differentiates between the __getitem__ of the sequence protocol and the __getitem__ of the mapping protocol.

Contributing thread:

[SJB]

Python-level AST interface

Brett Cannon started a brief thread to discuss where to go next with the Python AST branch. Though some of the discussion moved online at PyCon, the major decisions were reported by Martin v. Löwis:

  • The ast-objects branch (which used reference-counting instead of arena allocation) was dropped because it seemed less maintainable and people had agreed that exposing the C AST objects to Python was a bad idea anyway
  • Python code would have access to a "shadow tree" of the actual AST tree, accessible by calling compile() with the flag PyCF_ONLY_AST (0x400).

As a result, Python 2.5 now has a Python-level interface to AST objects:

>>> compile('"spam" if x else 42', '<string>', 'eval', 0x400)
<_ast.Expression object at 0x00BA0F50>

Contributing threads:

[SJB]

Allowing property to be used as a decorator

Georg Brandl suggested in passing that it would be nice if property() could be used as a decorator. Ian Bicking pointed out that you can already use property() this way as long as you only want a read-only property. However, the resulting property has no docstring, so Alex Martelli suggested that property use the __doc__ of its fget function if no docstring was provided. Guido approved it, and Georg Brandl provided a patch. Thus in Python 2.5, you'll be able to write read-only properties like:

@property
def x(self):
    """The x property"""
    return self._x + 42

Contributing threads:

[SJB]

Turning on unicode string literals for a module

Neil Schemenauer asked if it would be possible to have a from __future__ import unicode_strings statement which would turn all string literals into unicode literals for that module (without requiring the usual u prefix). Currently, you can turn on this kind of behavior for all modules using the undocumented -U command-line switch, but there's no way of enabling it on a per-module basis. There didn't seem to be enough momentum in the thread to implement such a thing however.

Contributing thread:

[SJB]

Allowing cProfile to print to other streams

Skip Montaro pointed out that the new cProfile module prints stuff to stdout. He suggested rewriting the necessary bits to add a stream= keyword argument where necessary and using stream.write(...) instead of the print statements. No patch was available at the time of this summary.

Contributing thread:

[SJB]

PEP 343 with-statement semantics

Mike Bland provided an initial implementation of PEP 343's with-statment. In writing some unit-tests for it, Guido discovered that the implementation would not allow generators like:

@contextmanager
def foo():
   try:
       yield
   except Exception:
       pass

with foo():
   1/0

to be equivalent to the corresponding in-line code:

try:
   1/0
except Exception:
   pass

because the PEP at the time did not allow context objects to suppress exceptions. Guido modified the patch and the PEP to require __exit__ to reraise the exception if it didn't want it suppressed.

Contributing threads:

[SJB]

Dropping Win9x support in Python 2.6

Neal Norwitz suggested that Python 2.6 no longer try to support Win9x and WinME and updated PEP 11 accordingly. There was a little rumbling about dropping the support, but no one stepped forward to volunteer to maintain the patches, and Guido suggested that anyone using a 6+ year old OS should be fine using an older Python too.

Contributing thread:

[SJB]

Removing non-Unicode support

Neal Norwitz suggested that the --disable-unicode switch might be a candidate for removal in Python 2.6. A few people were mildly concerned that the inability to remove Unicode support might make it harder to put Python on small hand-held devices. However, many (though not all) hand-helds already support Unicode, and currently a number of tests already fail if you use the --disable-unicode switch, so those who need this switch have not been actively maintaining it. Stripping out the numerous Py_USING_UNICODE declarations would substantially simplify some of the Python source. No final decision had been made at the time of this summary.

Contributing thread:

[SJB]

Translating the Python documentation

Facundo Batista had proposed translating the Library Reference and asked about how to get notifications when the documentation was updated (so that the translations could also be updated). Georg Brandl suggested a post-commit hook in SVN, though this would only give notifications at the module level. Fredrik Lundh suggested something based on his more dynamic library reference platform so that the notifications could indicate particular methods and functions instead.

Contributing threads:

[SJB]

PEP 338 updates

At Guido's suggestion, Nick Coghlan pared down PEP 338 to just the bare bones necessary to properly implement the -m switch. That means the runpy module will contain only a single function, run_module, which will import the named module using the standard import mechanism, and then execute the code in that module.

Contributing thread:

[SJB]

Bugfix procedures

Just a reminder of the procedure for applying bug patches in Python (thanks to a brief thread started by Arkadiusz Miskiewicz). Anyone can submit a patch, but it will not be committed until a committer reviews and commits the patch. Non-committers are encouraged to review and comment on patches, and a number of the committers have promised that anyone who reviews and comments on at least five patches can have any patch they like looked at.

Contributing threads:

[SJB]

Removing --with-wctype-functions

M.-A. Lemburg suggested removing support for --with-wctype-functions as it makes Unicode support work in non-standard ways. Though he announced the plan in December 2004, PEP 11 wasn't updated, so removal will be delayed until Python 2.6.

Contributing thread:

[SJB]

Making ASCII the default encoding

Neal Norwitz asked if we should finally make ASCII the default encoding as PEP 263 had promised in Python 2.3. He received only positive responses on this, and so in Python 2.5, any file missing a # -*- coding: ... -*- declaration and using non-ASCII characters will generate an error.

Contributing thread:

[SJB]

PEP 308: Conditional Expressions checked in

Thomas Wouters checked in a patch for PEP 308, so Python 2.5 now has the long-awaited conditional expressions!

Contributing thread:

[SJB]

Epilogue

This is a summary of traffic on the python-dev mailing list from February 16, 2006 through February 28, 2006. It is intended to inform the wider Python community of on-going developments on the list on a semi-monthly basis. An archive of previous summaries is available online.

An RSS feed of the titles of the summaries is available. You can also watch comp.lang.python or comp.lang.python.announce for new summaries (or through their email gateways of python-list or python-announce, respectively, as found at http://mail.python.org).

This python-dev summary is the 14th written by the python-dev summary team of Steve Bethard and Tony Meyer (on-time, schmon-time).

To contact us, please send email:

  • Steve Bethard (steven.bethard at gmail.com)
  • Tony Meyer (tony.meyer at gmail.com)

Do not post to comp.lang.python if you wish to reach us.

The Python Software Foundation is the non-profit organization that holds the intellectual property for Python. It also tries to advance the development and use of Python. If you find the python-dev Summary helpful please consider making a donation. You can make a donation at http://python.org/psf/donations.html . Every cent counts so even a small donation with a credit card, check, or by PayPal helps.

Commenting on Topics

To comment on anything mentioned here, just post to comp.lang.python (or email python-list@python.org which is a gateway to the newsgroup) with a subject line mentioning what you are discussing. All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on something. And if all of this really interests you then get involved and join python-dev!

How to Read the Summaries

This summary is written using reStructuredText. Any unfamiliar punctuation is probably markup for reST (otherwise it is probably regular expression syntax or a typo :); you can safely ignore it. We do suggest learning reST, though; it's simple and is accepted for PEP markup and can be turned into many different formats like HTML and LaTeX. Unfortunately, even though reST is standardized, the wonders of programs that like to reformat text do not allow us to guarantee you will be able to run the text version of this summary through Docutils as-is unless it is from the original text file.