From victor.stinner at haypocalc.com Mon Oct 1 00:59:40 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 1 Oct 2007 00:59:40 +0200 Subject: [Python-3000] Python, int/long and GMP In-Reply-To: <200709281858.29705.victor.stinner@haypocalc.com> References: <200709280429.39396.victor.stinner@haypocalc.com> <200709281858.29705.victor.stinner@haypocalc.com> Message-ID: <200710010059.41161.victor.stinner@haypocalc.com> Hi, I wrote another patch with two improvment: use small integer cache and use Python memory allocation functions. Now GMP overhead (pystones result) is only -2% and not -20% (previous patch). Since the patch is huge, I prefer to leave copy on my server: http://www.haypocalc.com/tmp/py3k-long_gmp-v2.patch Victor -- Victor Stinner http://hachoir.org/ From guido at python.org Mon Oct 1 01:14:07 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 30 Sep 2007 16:14:07 -0700 Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytesand Mutable Buffer) In-Reply-To: References: <20070929142126.D61D23A4045@sparrow.telecommunity.com> <20070929151127.AE5203A4045@sparrow.telecommunity.com> <20070929155823.C552B3A4045@sparrow.telecommunity.com> Message-ID: I see no other solution to this thread than to revert the decision that comparing bytes and str raises TypeError. It may catch a trivial mistake or two, but the far from trivial, subtle issues it causes for more sophisticated code just aren't worth it. I'll add this to PEP 3137. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Oct 1 01:25:20 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 30 Sep 2007 16:25:20 -0700 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer Message-ID: Thanks all for the focused and helpful discussion on this PEP. Here's a new posting of the full text of the PEP as it now stands. Most of the changes since the first posting are fleshing out of some details; the decision to make the individual elements of bytes and buffer be ints; and the decision to change bytes/str and buffer/str comparisons again to just return False instead of raising TypeError. (I'm not favorable towards the proposal of c'x' style literals or changes to the I/O APIs to use different names for calls involving bytes instead of text. If you still disagree, please start a new thread with new subject line.) I plan to accept the PEP within a day or two barring major objections, and expect to start implementing soon after. --Guido PEP: 3137 Title: Immutable Bytes and Mutable Buffer Version: $Revision: 58290 $ Last-Modified: $Date: 2007-09-30 16:19:14 -0700 (Sun, 30 Sep 2007) $ Author: Guido van Rossum Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 26-Sep-2007 Python-Version: 3.0 Post-History: 26-Sep-2007, 30-Sep-2007 Introduction ============ After releasing Python 3.0a1 with a mutable bytes type, pressure mounted to add a way to represent immutable bytes. Gregory P. Smith proposed a patch that would allow making a bytes object temporarily immutable by requesting that the data be locked using the new buffer API from PEP 3118. This did not seem the right approach to me. Jeffrey Yasskin, with the help of Adam Hupp, then prepared a patch to make the bytes type immutable (by crudely removing all mutating APIs) and fix the fall-out in the test suite. This showed that there aren't all that many places that depend on the mutability of bytes, with the exception of code that builds up a return value from small pieces. Thinking through the consequences, and noticing that using the array module as an ersatz mutable bytes type is far from ideal, and recalling a proposal put forward earlier by Talin, I floated the suggestion to have both a mutable and an immutable bytes type. (This had been brought up before, but until seeing the evidence of Jeffrey's patch I wasn't open to the suggestion.) Moreover, a possible implementation strategy became clear: use the old PyString implementation, stripped down to remove locale support and implicit conversions to/from Unicode, for the immutable bytes type, and keep the new PyBytes implementation as the mutable bytes type. The ensuing discussion made it clear that the idea is welcome but needs to be specified more precisely. Hence this PEP. Advantages ========== One advantage of having an immutable bytes type is that code objects can use these. It also makes it possible to efficiently create hash tables using bytes for keys; this may be useful when parsing protocols like HTTP or SMTP which are based on bytes representing text. Porting code that manipulates binary data (or encoded text) in Python 2.x will be easier using the new design than using the original 3.0 design with mutable bytes; simply replace ``str`` with ``bytes`` and change '...' literals into b'...' literals. Naming ====== I propose the following type names at the Python level: - ``bytes`` is an immutable array of bytes (PyString) - ``buffer`` is a mutable array of bytes (PyBytes) - ``memoryview`` is a bytes view on another object (PyMemory) The old type named ``buffer`` is so similar to the new type ``memoryview``, introduce by PEP 3118, that it is redundant. The rest of this PEP doesn't discuss the functionality of ``memoryview``; it is just mentioned here to justify getting rid of the old ``buffer`` type so we can reuse its name for the mutable bytes type. While eventually it makes sense to change the C API names, this PEP maintains the old C API names, which should be familiar to all. Literal Notations ================= The b'...' notation introduced in Python 3.0a1 returns an immutable bytes object, whatever variation is used. To create a mutable bytes buffer object, use buffer(b'...') or buffer([...]). The latter may use a list of integers in range(256). Functionality ============= PEP 3118 Buffer API ------------------- Both bytes and buffer implement the PEP 3118 buffer API. The bytes type only implements read-only requests; the buffer type allows writable and data-locked requests as well. The element data type is always 'B' (i.e. unsigned byte). Constructors ------------ There are four forms of constructors, applicable to both bytes and buffer: - ``bytes()``, ``bytes()``, ``buffer()``, ``buffer()``: simple copying constructors, with the note that ``bytes()`` might return its (immutable) argument. - ``bytes(, [, ])``, ``buffer(, [, ])``: encode a text string. Note that the ``str.encode()`` method returns an *immutable* bytes object. The argument is mandatory; is optional. - ``bytes()``, ``buffer()``: construct a bytes or buffer object from anything implementing the PEP 3118 buffer API. - ``bytes()``, ``buffer()``: construct an immutable bytes or mutable buffer object from a stream of integers in range(256). - ``buffer()``: construct a zero-initialized buffer of a given length. Comparisons ----------- The bytes and buffer types are comparable with each other and orderable, so that e.g. b'abc' == buffer(b'abc') < b'abd'. Comparing either type to a str object for equality returns False regardless of the contents of either operand. Ordering comparisons with str raise TypeError. This is all conformant to the standard rules for comparison and ordering between objects of incompatible types. (**Note:** in Python 3.0a1, comparing a bytes instance with a str instance would raise TypeError, on the premise that this would catch the occasional mistake quicker, especially in code ported from Python 2.x. However, a long discussion on the python-3000 list pointed out so many problems with this that it is clearly a bad idea, to be rolled back in 3.0a2 regardless of the fate of the rest of this PEP.) Slicing ------- Slicing a bytes object returns a bytes object. Slicing a buffer object returns a buffer object. Slice assignment to a mutable buffer object accept anything that implements the PEP 3118 buffer API, or an iterable of integers in range(256). Indexing -------- Indexing bytes and buffer returns small ints (like the bytes type in 3.0a1, and like lists or array.array('B')). Assignment to an item of a mutable buffer object accepts an int in range(256). (To assign from a bytes sequence, use a slice assignment.) Str() and Repr() ---------------- The str() and repr() functions return the same thing for these objects. The repr() of a bytes object returns a b'...' style literal. The repr() of a buffer returns a string of the form "buffer(b'...')". Operators --------- The following operators are implemented by the bytes and buffer types, except where mentioned: - ``b1 + b2``: concatenation. With mixed bytes/buffer operands, the return type is that of the first argument (this seems arbitrary until you consider how ``+=`` works). - ``b1 += b2'': mutates b1 if it is a buffer object. - ``b * n``, ``n * b``: repetition; n must be an integer. - ``b *= n``: mutates b if it is a buffer object. - ``b1 in b2``, ``b1 not in b2``: substring test; b1 can be any object implementing the PEP 3118 buffer API. - ``i in b``, ``i not in b``: single-byte membership test; i must be an integer (if it is a length-1 bytes array, it is considered to be a substring test, with the same outcome). - ``len(b)``: the number of bytes. - ``hash(b)``: the hash value; only implemented by the bytes type. Note that the % operator is *not* implemented. It does not appear worth the complexity. Methods ------- The following methods are implemented by bytes as well as buffer, with similar semantics. They accept anything that implements the PEP 3118 buffer API for bytes arguments, and return the same type as the object whose method is called ("self"):: .capitalize(), .center(), .count(), .decode(), .endswith(), .expandtabs(), .find(), .index(), .isalnum(), .isalpha(), .isdigit(), .islower(), .isspace(), .istitle(), .isupper(), .join(), .ljust(), .lower(), .lstrip(), .partition(), .replace(), .rfind(), .rindex(), .rjust(), .rpartition(), .rsplit(), .rstrip(), .split(), .splitlines(), .startswith(), .strip(), .swapcase(), .title(), .translate(), .upper(), .zfill() This is exactly the set of methods present on the str type in Python 2.x, with the exclusion of .encode(). The signatures and semantics are the same too. However, whenever character classes like letter, whitespace, lower case are used, the ASCII definitions of these classes are used. (The Python 2.x str type uses the definitions from the current locale, settable through the locale module.) The .encode() method is left out because of the more strict definitions of encoding and decoding in Python 3000: encoding always takes a Unicode string and returns a bytes sequence, and decoding always takes a bytes sequence and returns a Unicode string. In addition, both types implement the class method ``.fromhex()``, which constructs an object from a string containing hexadecimal values (with or without spaces between the bytes). The buffer type implements these additional methods from the MutableSequence ABC (see PEP 3119): .extend(), .insert(), .append(), .reverse(), .pop(), .remove(). Bytes and the Str Type ---------------------- Like the bytes type in Python 3.0a1, and unlike the relationship between str and unicode in Python 2.x, any attempt to mix bytes (or buffer) objects and str objects without specifying an encoding will raise a TypeError exception. This is the case even for simply comparing a bytes or buffer object to a str object (even violating the general rule that comparing objects of different types for equality should just return False). Conversions between bytes or buffer objects and str objects must always be explicit, using an encoding. There are two equivalent APIs: ``str(b, [, ])`` is equivalent to ``b.decode([, ])``, and ``bytes(s, [, ])`` is equivalent to ``s.encode([, ])``. There is one exception: we can convert from bytes (or buffer) to str without specifying an encoding by writing ``str(b)``. This produces the same result as ``repr(b)``. This exception is necessary because of the general promise that *any* object can be printed, and printing is just a special case of conversion to str. There is however no promise that printing a bytes object interprets the individual bytes as characters (unlike in Python 2.x). The str type currently implements the PEP 3118 buffer API. While this is perhaps occasionally convenient, it is also potentially confusing, because the bytes accessed via the buffer API represent a platform-depending encoding: depending on the platform byte order and a compile-time configuration option, the encoding could be UTF-16-BE, UTF-16-LE, UTF-32-BE, or UTF-32-LE. Worse, a different implementation of the str type might completely change the bytes representation, e.g. to UTF-8, or even make it impossible to access the data as a contiguous array of bytes at all. Therefore, the PEP 3118 buffer API will be removed from the str type. Pickling -------- Left as an exercise for the reader. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Mon Oct 1 02:11:38 2007 From: brett at python.org (Brett Cannon) Date: Sun, 30 Sep 2007 17:11:38 -0700 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: Message-ID: +1 from me. -Brett On 9/30/07, Guido van Rossum wrote: > Thanks all for the focused and helpful discussion on this PEP. Here's > a new posting of the full text of the PEP as it now stands. Most of > the changes since the first posting are fleshing out of some details; > the decision to make the individual elements of bytes and buffer be > ints; and the decision to change bytes/str and buffer/str comparisons > again to just return False instead of raising TypeError. > > (I'm not favorable towards the proposal of c'x' style literals or > changes to the I/O APIs to use different names for calls involving > bytes instead of text. If you still disagree, please start a new > thread with new subject line.) > > I plan to accept the PEP within a day or two barring major objections, > and expect to start implementing soon after. > > --Guido > > PEP: 3137 > Title: Immutable Bytes and Mutable Buffer > Version: $Revision: 58290 $ > Last-Modified: $Date: 2007-09-30 16:19:14 -0700 (Sun, 30 Sep 2007) $ > Author: Guido van Rossum > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 26-Sep-2007 > Python-Version: 3.0 > Post-History: 26-Sep-2007, 30-Sep-2007 > > Introduction > ============ > > After releasing Python 3.0a1 with a mutable bytes type, pressure > mounted to add a way to represent immutable bytes. Gregory P. Smith > proposed a patch that would allow making a bytes object temporarily > immutable by requesting that the data be locked using the new buffer > API from PEP 3118. This did not seem the right approach to me. > > Jeffrey Yasskin, with the help of Adam Hupp, then prepared a patch to > make the bytes type immutable (by crudely removing all mutating APIs) > and fix the fall-out in the test suite. This showed that there aren't > all that many places that depend on the mutability of bytes, with the > exception of code that builds up a return value from small pieces. > > Thinking through the consequences, and noticing that using the array > module as an ersatz mutable bytes type is far from ideal, and > recalling a proposal put forward earlier by Talin, I floated the > suggestion to have both a mutable and an immutable bytes type. (This > had been brought up before, but until seeing the evidence of Jeffrey's > patch I wasn't open to the suggestion.) > > Moreover, a possible implementation strategy became clear: use the old > PyString implementation, stripped down to remove locale support and > implicit conversions to/from Unicode, for the immutable bytes type, > and keep the new PyBytes implementation as the mutable bytes type. > > The ensuing discussion made it clear that the idea is welcome but > needs to be specified more precisely. Hence this PEP. > > Advantages > ========== > > One advantage of having an immutable bytes type is that code objects > can use these. It also makes it possible to efficiently create hash > tables using bytes for keys; this may be useful when parsing protocols > like HTTP or SMTP which are based on bytes representing text. > > Porting code that manipulates binary data (or encoded text) in Python > 2.x will be easier using the new design than using the original 3.0 > design with mutable bytes; simply replace ``str`` with ``bytes`` and > change '...' literals into b'...' literals. > > Naming > ====== > > I propose the following type names at the Python level: > > - ``bytes`` is an immutable array of bytes (PyString) > > - ``buffer`` is a mutable array of bytes (PyBytes) > > - ``memoryview`` is a bytes view on another object (PyMemory) > > The old type named ``buffer`` is so similar to the new type > ``memoryview``, introduce by PEP 3118, that it is redundant. The rest > of this PEP doesn't discuss the functionality of ``memoryview``; it is > just mentioned here to justify getting rid of the old ``buffer`` type > so we can reuse its name for the mutable bytes type. > > While eventually it makes sense to change the C API names, this PEP > maintains the old C API names, which should be familiar to all. > > Literal Notations > ================= > > The b'...' notation introduced in Python 3.0a1 returns an immutable > bytes object, whatever variation is used. To create a mutable bytes > buffer object, use buffer(b'...') or buffer([...]). The latter may > use a list of integers in range(256). > > Functionality > ============= > > PEP 3118 Buffer API > ------------------- > > Both bytes and buffer implement the PEP 3118 buffer API. The bytes > type only implements read-only requests; the buffer type allows > writable and data-locked requests as well. The element data type is > always 'B' (i.e. unsigned byte). > > Constructors > ------------ > > There are four forms of constructors, applicable to both bytes and > buffer: > > - ``bytes()``, ``bytes()``, ``buffer()``, > ``buffer()``: simple copying constructors, with the note > that ``bytes()`` might return its (immutable) argument. > > - ``bytes(, [, ])``, ``buffer(, > [, ])``: encode a text string. Note that the > ``str.encode()`` method returns an *immutable* bytes object. > The argument is mandatory; is optional. > > - ``bytes()``, ``buffer()``: construct a > bytes or buffer object from anything implementing the PEP 3118 > buffer API. > > - ``bytes()``, ``buffer()``: > construct an immutable bytes or mutable buffer object from a > stream of integers in range(256). > > - ``buffer()``: construct a zero-initialized buffer of a given > length. > > Comparisons > ----------- > > The bytes and buffer types are comparable with each other and > orderable, so that e.g. b'abc' == buffer(b'abc') < b'abd'. > > Comparing either type to a str object for equality returns False > regardless of the contents of either operand. Ordering comparisons > with str raise TypeError. This is all conformant to the standard > rules for comparison and ordering between objects of incompatible > types. > > (**Note:** in Python 3.0a1, comparing a bytes instance with a str > instance would raise TypeError, on the premise that this would catch > the occasional mistake quicker, especially in code ported from Python > 2.x. However, a long discussion on the python-3000 list pointed out > so many problems with this that it is clearly a bad idea, to be rolled > back in 3.0a2 regardless of the fate of the rest of this PEP.) > > Slicing > ------- > > Slicing a bytes object returns a bytes object. Slicing a buffer > object returns a buffer object. > > Slice assignment to a mutable buffer object accept anything that > implements the PEP 3118 buffer API, or an iterable of integers in > range(256). > > Indexing > -------- > > Indexing bytes and buffer returns small ints (like the bytes type in > 3.0a1, and like lists or array.array('B')). > > Assignment to an item of a mutable buffer object accepts an int in > range(256). (To assign from a bytes sequence, use a slice > assignment.) > > Str() and Repr() > ---------------- > > The str() and repr() functions return the same thing for these > objects. The repr() of a bytes object returns a b'...' style literal. > The repr() of a buffer returns a string of the form "buffer(b'...')". > > Operators > --------- > > The following operators are implemented by the bytes and buffer types, > except where mentioned: > > - ``b1 + b2``: concatenation. With mixed bytes/buffer operands, > the return type is that of the first argument (this seems arbitrary > until you consider how ``+=`` works). > > - ``b1 += b2'': mutates b1 if it is a buffer object. > > - ``b * n``, ``n * b``: repetition; n must be an integer. > > - ``b *= n``: mutates b if it is a buffer object. > > - ``b1 in b2``, ``b1 not in b2``: substring test; b1 can be any > object implementing the PEP 3118 buffer API. > > - ``i in b``, ``i not in b``: single-byte membership test; i must > be an integer (if it is a length-1 bytes array, it is considered > to be a substring test, with the same outcome). > > - ``len(b)``: the number of bytes. > > - ``hash(b)``: the hash value; only implemented by the bytes type. > > Note that the % operator is *not* implemented. It does not appear > worth the complexity. > > Methods > ------- > > The following methods are implemented by bytes as well as buffer, with > similar semantics. They accept anything that implements the PEP 3118 > buffer API for bytes arguments, and return the same type as the object > whose method is called ("self"):: > > .capitalize(), .center(), .count(), .decode(), .endswith(), > .expandtabs(), .find(), .index(), .isalnum(), .isalpha(), .isdigit(), > .islower(), .isspace(), .istitle(), .isupper(), .join(), .ljust(), > .lower(), .lstrip(), .partition(), .replace(), .rfind(), .rindex(), > .rjust(), .rpartition(), .rsplit(), .rstrip(), .split(), > .splitlines(), .startswith(), .strip(), .swapcase(), .title(), > .translate(), .upper(), .zfill() > > This is exactly the set of methods present on the str type in Python > 2.x, with the exclusion of .encode(). The signatures and semantics > are the same too. However, whenever character classes like letter, > whitespace, lower case are used, the ASCII definitions of these > classes are used. (The Python 2.x str type uses the definitions from > the current locale, settable through the locale module.) The > .encode() method is left out because of the more strict definitions of > encoding and decoding in Python 3000: encoding always takes a Unicode > string and returns a bytes sequence, and decoding always takes a bytes > sequence and returns a Unicode string. > > In addition, both types implement the class method ``.fromhex()``, > which constructs an object from a string containing hexadecimal values > (with or without spaces between the bytes). > > The buffer type implements these additional methods from the > MutableSequence ABC (see PEP 3119): > > .extend(), .insert(), .append(), .reverse(), .pop(), .remove(). > > Bytes and the Str Type > ---------------------- > > Like the bytes type in Python 3.0a1, and unlike the relationship > between str and unicode in Python 2.x, any attempt to mix bytes (or > buffer) objects and str objects without specifying an encoding will > raise a TypeError exception. This is the case even for simply > comparing a bytes or buffer object to a str object (even violating the > general rule that comparing objects of different types for equality > should just return False). > > Conversions between bytes or buffer objects and str objects must > always be explicit, using an encoding. There are two equivalent APIs: > ``str(b, [, ])`` is equivalent to > ``b.decode([, ])``, and > ``bytes(s, [, ])`` is equivalent to > ``s.encode([, ])``. > > There is one exception: we can convert from bytes (or buffer) to str > without specifying an encoding by writing ``str(b)``. This produces > the same result as ``repr(b)``. This exception is necessary because > of the general promise that *any* object can be printed, and printing > is just a special case of conversion to str. There is however no > promise that printing a bytes object interprets the individual bytes > as characters (unlike in Python 2.x). > > The str type currently implements the PEP 3118 buffer API. While this > is perhaps occasionally convenient, it is also potentially confusing, > because the bytes accessed via the buffer API represent a > platform-depending encoding: depending on the platform byte order and > a compile-time configuration option, the encoding could be UTF-16-BE, > UTF-16-LE, UTF-32-BE, or UTF-32-LE. Worse, a different implementation > of the str type might completely change the bytes representation, > e.g. to UTF-8, or even make it impossible to access the data as a > contiguous array of bytes at all. Therefore, the PEP 3118 buffer API > will be removed from the str type. > > Pickling > -------- > > Left as an exercise for the reader. > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/brett%40python.org > From aahz at pythoncraft.com Mon Oct 1 04:10:02 2007 From: aahz at pythoncraft.com (Aahz) Date: Sun, 30 Sep 2007 19:10:02 -0700 Subject: [Python-3000] Extension: mpf for GNU MP floating point In-Reply-To: References: Message-ID: <20071001021001.GA12746@panix.com> On Thu, Sep 27, 2007, Rob Crowther wrote: > > I've uploaded the latest code to http://umass.glexia.net/mpf.tar.bz2 > > Here's a quick rundown of supported functions and operations. Could you explain what your goal is here? MPF isn't currently part of the standard library, so it probably should exist as a standalone extension first. This mailing list is probably not the right place for discussion, either. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ The best way to get information on Usenet is not to ask a question, but to post the wrong information. From carsten at uniqsys.com Mon Oct 1 04:10:32 2007 From: carsten at uniqsys.com (Carsten Haese) Date: Sun, 30 Sep 2007 22:10:32 -0400 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: Message-ID: <1191204632.3258.6.camel@localhost.localdomain> On Sun, 2007-09-30 at 16:25 -0700, Guido van Rossum wrote: > [...] > (**Note:** in Python 3.0a1, comparing a bytes instance with a str > instance would raise TypeError, on the premise that this would catch > the occasional mistake quicker, especially in code ported from Python > 2.x. However, a long discussion on the python-3000 list pointed out > so many problems with this that it is clearly a bad idea, to be rolled > back in 3.0a2 regardless of the fate of the rest of this PEP.) > [...] > Like the bytes type in Python 3.0a1, and unlike the relationship > between str and unicode in Python 2.x, any attempt to mix bytes (or > buffer) objects and str objects without specifying an encoding will > raise a TypeError exception. This is the case even for simply > comparing a bytes or buffer object to a str object (even violating the > general rule that comparing objects of different types for equality > should just return False). It appears that you didn't revise the latter paragraph after adding the former paragraph. -- Carsten Haese http://informixdb.sourceforge.net From alexandre at peadrop.com Mon Oct 1 04:44:31 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 30 Sep 2007 22:44:31 -0400 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: Message-ID: On 9/30/07, Guido van Rossum wrote: > Pickling > -------- > > Left as an exercise for the reader. > A simple way to add specific pickling support for bytes/buffer objects would be to define two new constants: BYTES = b'\x8c' # push a bytes object BUFFER = b'\x8d' # push a buffer object And add the following pickling and unpickling procedures: def save_bytes(self, obj, pack=struct.pack): n = len(obj) self.write(BYTES + pack(" References: <1191204632.3258.6.camel@localhost.localdomain> Message-ID: On 9/30/07, Carsten Haese wrote: > On Sun, 2007-09-30 at 16:25 -0700, Guido van Rossum wrote: > > [...] > > (**Note:** in Python 3.0a1, comparing a bytes instance with a str > > instance would raise TypeError, on the premise that this would catch > > the occasional mistake quicker, especially in code ported from Python > > 2.x. However, a long discussion on the python-3000 list pointed out > > so many problems with this that it is clearly a bad idea, to be rolled > > back in 3.0a2 regardless of the fate of the rest of this PEP.) > > [...] > > Like the bytes type in Python 3.0a1, and unlike the relationship > > between str and unicode in Python 2.x, any attempt to mix bytes (or > > buffer) objects and str objects without specifying an encoding will > > raise a TypeError exception. This is the case even for simply > > comparing a bytes or buffer object to a str object (even violating the > > general rule that comparing objects of different types for equality > > should just return False). > > It appears that you didn't revise the latter paragraph after adding the > former paragraph. Good catch! Fixed in svn. The latter paragraph now reads: """ Like the bytes type in Python 3.0a1, and unlike the relationship between str and unicode in Python 2.x, attempts to mix bytes (or buffer) objects and str objects without specifying an encoding will raise a TypeError exception. (However, comparing bytes/buffer and str objects for equality will simply return False; see the section on Comparisons above.) """ -- --Guido van Rossum (home page: http://www.python.org/~guido/) From oliphant.travis at ieee.org Mon Oct 1 06:47:34 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sun, 30 Sep 2007 23:47:34 -0500 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: Message-ID: +1 from me. I like that the str will not support the buffer API because it gets rid of one of the flags in the PEP 3118 API that was only there to support the abuse of the buffer API by unicode objects. - Travis Oliphant From greg at krypto.org Mon Oct 1 07:16:29 2007 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 30 Sep 2007 22:16:29 -0700 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: Message-ID: <52dc1c820709302216h34a82c45m2385f8dcf34de800@mail.gmail.com> +10 from me -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070930/ee7c03f7/attachment.htm From ncoghlan at gmail.com Mon Oct 1 15:55:12 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 01 Oct 2007 23:55:12 +1000 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: Message-ID: <4700FC40.1060206@gmail.com> Brett Cannon wrote: > +1 from me. Looks good to me too: +1 I wouldn't mind seeing some iteration-in-C bit-bashing operations in there eventually, but they aren't needed on the first pass, and even being able to do things like the following will be a decent improvement over the status quo for low-level bitstream manipulation: data = bytes([x & 0x1F for x in orig_data]) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From dalcinl at gmail.com Mon Oct 1 17:00:11 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 1 Oct 2007 12:00:11 -0300 Subject: [Python-3000] [Python-Dev] building with -Wwrite-strings In-Reply-To: <20071001141007.GA20122@code0.codespeak.net> References: <46FD6DA2.1060107@v.loewis.de> <20071001141007.GA20122@code0.codespeak.net> Message-ID: Yes, you are completely right. I ended up realizing that a change like this would break almost all third-party extension. But... What about of doing this for Py3K? Third-party extension have to be fixed anyway. On 10/1/07, Armin Rigo wrote: > Hi Martin, > > On Fri, Sep 28, 2007 at 11:09:54PM +0200, "Martin v. L?wis" wrote: > > What's wrong with > > > > static const char *kwlist[] = {"x", "base", 0}; > > The following goes wrong if we try again to walk this path: > http://mail.python.org/pipermail/python-dev/2006-February/060689.html > > > Armin > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From skip at pobox.com Mon Oct 1 19:14:40 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 1 Oct 2007 12:14:40 -0500 Subject: [Python-3000] bytes vs. array.array vs. numpy.array In-Reply-To: <4700FC40.1060206@gmail.com> References: <4700FC40.1060206@gmail.com> Message-ID: <18177.11008.244338.509409@montanaro.dyndns.org> Nick> I wouldn't mind seeing some iteration-in-C bit-bashing operations Nick> in there eventually... Nick> data = bytes([x & 0x1F for x in orig_data]) This begins to make it look what you want is array.array or nump.array. Python's arrays don't support bitwise operations either, but numpy's do. How much overlap is there between the three types? Does it make sense to consider that canonical underlying array type now (or in the near future, sometime before the release of 3.0 final)? Skip From ncoghlan at gmail.com Mon Oct 1 23:18:19 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 02 Oct 2007 07:18:19 +1000 Subject: [Python-3000] bytes vs. array.array vs. numpy.array In-Reply-To: <18177.11008.244338.509409@montanaro.dyndns.org> References: <4700FC40.1060206@gmail.com> <18177.11008.244338.509409@montanaro.dyndns.org> Message-ID: <4701641B.4040501@gmail.com> skip at pobox.com wrote: > Nick> I wouldn't mind seeing some iteration-in-C bit-bashing operations > Nick> in there eventually... > > Nick> data = bytes([x & 0x1F for x in orig_data]) > > This begins to make it look what you want is array.array or nump.array. > Python's arrays don't support bitwise operations either, but numpy's do. > How much overlap is there between the three types? Does it make sense to > consider that canonical underlying array type now (or in the near future, > sometime before the release of 3.0 final)? Not hugely urgent for me - it's a direction I'd like to see the data type go in (as the less custom code needed on the C/C++ side of the fence to do reasonably efficient low level I/O the better as far as I am concerned), but work is still on 2.4 (with no compelling motivation to upgrade) so I'm personally resigned to the use of assorted ord(), chr() and ''.join() calls for the immediate future. The advantage of having the bit manipulation features in the builtin bytes type for this kind of thing over numpy.array is that I expect the builtin bytes type to be usable directly with Py3k versions of libraries like pyserial, and numpy would be a big dependency to bring in just to get more efficient bit-oriented operations on a byte sequence - array.array doesn't have them (not to mention the fact that these operations would make far less sense for any array containing something other than bytes). However, because the addition of any bit-oriented operations to the bytes/buffer types would be a new backwardly-compatible feature, it can be proposed whenever is convenient rather than having to be done right now. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From greg.ewing at canterbury.ac.nz Tue Oct 2 03:19:32 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 02 Oct 2007 14:19:32 +1300 Subject: [Python-3000] bytes vs. array.array vs. numpy.array In-Reply-To: <4701641B.4040501@gmail.com> References: <4700FC40.1060206@gmail.com> <18177.11008.244338.509409@montanaro.dyndns.org> <4701641B.4040501@gmail.com> Message-ID: <47019CA4.4010403@canterbury.ac.nz> Nick Coghlan wrote: > numpy would be a big dependency to bring in just to > get more efficient bit-oriented operations on a byte sequence Random thought - if long integers were to use byte sequences internally to hold their data, it might be possible to get this more or less for free in terms of code size. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From tjreedy at udel.edu Tue Oct 2 05:59:12 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 1 Oct 2007 23:59:12 -0400 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer References: <4700FC40.1060206@gmail.com> Message-ID: "Nick Coghlan" wrote in message news:4700FC40.1060206 at gmail.com... | Brett Cannon wrote: | > +1 from me. | | Looks good to me too: +1 | | I wouldn't mind seeing some iteration-in-C bit-bashing operations in | there eventually, but they aren't needed on the first pass, and even | being able to do things like the following will be a decent improvement | over the status quo for low-level bitstream manipulation: | | data = bytes([x & 0x1F for x in orig_data]) If orig_data were mutable (the new buffer, as proposed in the PEP), would not for i in range(len(orig_data)): orig_data[i] &= 0x1F do it in place? (I don't have .0a1 to try on the current bytes.) tjr From lists at cheimes.de Tue Oct 2 09:59:26 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 02 Oct 2007 09:59:26 +0200 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: <4700FC40.1060206@gmail.com> Message-ID: Terry Reedy wrote: > If orig_data were mutable (the new buffer, as proposed in the PEP), would > not > > for i in range(len(orig_data)): > orig_data[i] &= 0x1F > > do it in place? (I don't have .0a1 to try on the current bytes.) Good catch! Python 3.0a1 (py3k:58282, Sep 29 2007, 15:07:57) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 >>> orig_data = b"abc" >>> orig_data b'abc' >>> for i in range(len(orig_data)): ... orig_data[i] &= 0x1F ... >>> orig_data b'\x01\x02\x03' It'd be useful and more efficient if the new buffer type would support the bit wise operations directly: >>> orig_data &= 0x1F TypeError: unsupported operand type(s) for &=: 'bytes' and 'int' >>> orig_data &= b"\x1F" TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes' Christian From guido at python.org Tue Oct 2 16:10:04 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Oct 2007 07:10:04 -0700 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: <4700FC40.1060206@gmail.com> Message-ID: I am hereby accepting my own PEP 3137. The responses fell into three categories: enthusiastic +1s, textual corrections, and ideas for future enhancements. That's about as positive as it gets for any proposal. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From adam at hupp.org Tue Oct 2 16:37:22 2007 From: adam at hupp.org (Adam Hupp) Date: Tue, 2 Oct 2007 10:37:22 -0400 Subject: [Python-3000] Emacs22 python.el support for py3k Message-ID: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> I've submitted patches to emacs for python 3000 support. It does not handle any new syntax but the emacs<->python interaction works again. This applies to the python.el that ships with emacs22, not python-mode.el. The changes are available in emacs cvs. If you don't want to build a new copy it should be sufficient to pull the files python.el, emacs.py, emacs2.py and emacs3.py. -- Adam Hupp | http://hupp.org/adam/ From guido at python.org Tue Oct 2 17:04:34 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Oct 2007 08:04:34 -0700 Subject: [Python-3000] Emacs22 python.el support for py3k In-Reply-To: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> Message-ID: On 10/2/07, Adam Hupp wrote: > I've submitted patches to emacs for python 3000 support. It does not > handle any new syntax but the emacs<->python interaction works again. > This applies to the python.el that ships with emacs22, not > python-mode.el. Just curious -- how do python.el and python-mode.el differ? > The changes are available in emacs cvs. If you don't want to build a > new copy it should be sufficient to pull the files python.el, > emacs.py, emacs2.py and emacs3.py. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From adam at hupp.org Tue Oct 2 17:28:19 2007 From: adam at hupp.org (Adam Hupp) Date: Tue, 2 Oct 2007 11:28:19 -0400 Subject: [Python-3000] Emacs22 python.el support for py3k In-Reply-To: References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> Message-ID: <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> On 10/2/07, Guido van Rossum wrote: > > Just curious -- how do python.el and python-mode.el differ? Off the top of my head: * python-mode.el did not play well with transient-mark-mode (mark-block didn't work). transient-mark-mode highlights the marked region and is required for other functions (e.g. comment-dwim). * python-mode.el had problems with syntax highlighting in the presence of triple quoted strings and in comments. python.el does not. * python.el is supposed to be more consistent with other major modes. e.g. M-; for comment. * python.el ships with emacs. There are claims that python-mode.el was not as well maintained for FSF emacs as XEmacs. -- Adam Hupp | http://hupp.org/adam/ From barry at python.org Tue Oct 2 17:33:44 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 2 Oct 2007 11:33:44 -0400 Subject: [Python-3000] Emacs22 python.el support for py3k In-Reply-To: <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> Message-ID: <8B5D00B9-F765-43F6-B3DE-AA6BB50CA611@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 2, 2007, at 11:28 AM, Adam Hupp wrote: > On 10/2/07, Guido van Rossum wrote: >> >> Just curious -- how do python.el and python-mode.el differ? > > Off the top of my head: > > * python-mode.el did not play well with transient-mark-mode > (mark-block didn't work). transient-mark-mode highlights the marked > region and is required for other functions (e.g. comment-dwim). > > * python-mode.el had problems with syntax highlighting in the > presence of triple quoted strings and in comments. python.el does > not. > > * python.el is supposed to be more consistent with other major modes. > e.g. M-; for comment. > > * python.el ships with emacs. There are claims that python-mode.el > was not as well maintained for FSF emacs as XEmacs. It would be nice if there were only one mode that worked with both FSF Emacs and XEmacs and merged the best qualities of both modes. I don't have much time to work on that, and I suspect Skip is pretty busy too. Adam, if you're interested, willing, and able to help develop such a merge, python-mode at python.org would be the place to do so. I'd certainly be willing to test and I'd try to do a limited amount of XEmacs compatibility hacking. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRwJk2XEjvBPtnXfVAQJ9ZgP/bbG+OSHEnWGCBIXibnTzxEUL2ifIO8YU E/odKLMogXKFc40/weansKpjX9+Mv+/ye7a49HPH+AZ2vxKJsFvZVHill6F3pbh2 bd+94O1AkYIsuJwO7u3Pc3clje85jXDSUtmPRM3yWGweLDNNDaS4kxE02tNqdSTd rKiHn4gUzYk= =zMKd -----END PGP SIGNATURE----- From tjreedy at udel.edu Tue Oct 2 17:59:07 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 2 Oct 2007 11:59:07 -0400 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes andMutable Buffer References: <4700FC40.1060206@gmail.com> Message-ID: "Christian Heimes" wrote in message news:fdstov$av5$1 at sea.gmane.org... | Terry Reedy wrote: | > If orig_data were mutable (the new buffer, as proposed in the PEP), would | > not | > | > for i in range(len(orig_data)): | > orig_data[i] &= 0x1F | > | > do it in place? (I don't have .0a1 to try on the current bytes.) | | Good catch! | | Python 3.0a1 (py3k:58282, Sep 29 2007, 15:07:57) | [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 | >>> orig_data = b"abc" | >>> orig_data | b'abc' | >>> for i in range(len(orig_data)): | ... orig_data[i] &= 0x1F | ... | >>> orig_data | b'\x01\x02\x03' Thanks for testing this! Glad it worked. This sort of thing makes having bytes/buffer[i] an int a plus. (Just noticed, PEP accepted.) | It'd be useful and more efficient if the new buffer type would support | the bit wise operations directly: | | >>> orig_data &= 0x1F | TypeError: unsupported operand type(s) for &=: 'bytes' and 'int' This sort of broadcast behavior seems like numpy territory to me. Or better for a buffer subclass. Write it first in Python, using loops like above (partly for documentation and other implementations), then in C when interest and usage warrents. | >>> orig_data &= b"\x1F" | TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes' Ugh is my response. Stick with the first ;-). Terry Jan Reedy From guido at python.org Tue Oct 2 18:24:01 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Oct 2007 09:24:01 -0700 Subject: [Python-3000] Emacs22 python.el support for py3k In-Reply-To: <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> Message-ID: So is python.el a descendant of python-mode.el, or an independent development? On 10/2/07, Adam Hupp wrote: > On 10/2/07, Guido van Rossum wrote: > > > > Just curious -- how do python.el and python-mode.el differ? > > Off the top of my head: > > * python-mode.el did not play well with transient-mark-mode > (mark-block didn't work). transient-mark-mode highlights the marked > region and is required for other functions (e.g. comment-dwim). > > * python-mode.el had problems with syntax highlighting in the > presence of triple quoted strings and in comments. python.el does > not. > > * python.el is supposed to be more consistent with other major modes. > e.g. M-; for comment. > > * python.el ships with emacs. There are claims that python-mode.el > was not as well maintained for FSF emacs as XEmacs. > > -- > Adam Hupp | http://hupp.org/adam/ > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From adam at hupp.org Tue Oct 2 18:44:54 2007 From: adam at hupp.org (Adam Hupp) Date: Tue, 2 Oct 2007 12:44:54 -0400 Subject: [Python-3000] Emacs22 python.el support for py3k In-Reply-To: References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> Message-ID: <766a29bd0710020944x36e69500k9d8af8e4a619f537@mail.gmail.com> On 10/2/07, Guido van Rossum wrote: > So is python.el a descendant of python-mode.el, or an independent development? I've never seen a definitive statement but I believe it was developed independently. -- Adam Hupp | http://hupp.org/adam/ From skip at pobox.com Tue Oct 2 19:05:17 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 2 Oct 2007 12:05:17 -0500 Subject: [Python-3000] Emacs22 python.el support for py3k In-Reply-To: <766a29bd0710020944x36e69500k9d8af8e4a619f537@mail.gmail.com> References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> <766a29bd0710020944x36e69500k9d8af8e4a619f537@mail.gmail.com> Message-ID: <18178.31309.146267.585340@montanaro.dyndns.org> Guido> So is python.el a descendant of python-mode.el, or an independent Guido> development? Adam> I've never seen a definitive statement but I believe it was Adam> developed independently. Correct. Skip From qrczak at knm.org.pl Tue Oct 2 20:49:07 2007 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Tue, 02 Oct 2007 20:49:07 +0200 Subject: [Python-3000] Python, int/long and GMP In-Reply-To: <200709281858.29705.victor.stinner@haypocalc.com> References: <200709280429.39396.victor.stinner@haypocalc.com> <400ED549-B7C7-4A3D-9343-826B54E7B2BB@fuhm.net> <200709281858.29705.victor.stinner@haypocalc.com> Message-ID: <1191350947.8483.6.camel@qrnik> Dnia 28-09-2007, Pt o godzinie 18:58 +0200, Victor Stinner pisze: > I don't know GMP internals. I thaught that GMP uses an hack for small > integers. It does not. (And I'm glad that it does not, because it allows for super-specialized representation of small integers where even the space for mpz_t itself is not allocated. An GMP-internal optimization for the same cases would be underutilized and thus wasteful.) > I may also use Python garbage collector for GMP memory allocations > since GMP allows to use my own memory allocating functions. This would make linking with another library which uses GMP impossible (unless the allocator is compatible with malloc, reentrant etc.). Glasgow Haskell has been unfortunate to go that way. > GMP also has its own reference counter mechanism :-/ It does not. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From mark at qtrac.eu Wed Oct 3 04:24:50 2007 From: mark at qtrac.eu (Mark Summerfield) Date: Wed, 3 Oct 2007 03:24:50 +0100 Subject: [Python-3000] Are strs sequences of characters or disguised byte strings? Message-ID: <200710030324.50588.mark@qtrac.eu> In Python 3.0a1, exec() appears to normalize strings, but in other cases they don't appear to be normalized, and this leads to results that appear to be counter-intuitive in some cases, at least to me. >>> c1 = "\u00C7" >>> c2 = "C\u0327" >>> c3 = "\u0043\u0327" >>> c1, c2, c3 ('\xc7', 'C\u0327', 'C\u0327') >>> print(c1, c2) ? ? Clearly c1 and c2 are different at the byte level. But if we use them to create variables using exec(), Python appears to normalize them: >>> dir() ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3'] >>> exec("C\u0327 = 5") >>> dir() ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7'] >>> ? 5 >>> exec("\u00C7 = -7") >>> dir() ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7'] >>> ? -7 This seems to be the right behaviour to me, since from the point of view of a programmer, ? is the name of the variable, no matter what the underlying byte encoding used to represent the variable's name. >>> print(c1, c2) ? ? >>> c1.encode("utf8") == c2.encode("utf8") False This is what I'd expect, since here I'm comparing the actual bytes. But when I compare them as strings I really expect them to be compared as sequences of characters (in a human sense), so this: >>> c1 == c2 False seems counter-intuitive to me. It is easy to fix: >>> from unicodedata import normalize >>> normalize("NFKD", c1) == normalize("NFKD", c2) True but isn't it asking a lot of Python users to use normalize() whenever they want to perform such a basic operation as string comparison? Another issue that arises is that you can end up with duplicate dictionary keys and set elements. (The duplication is in human terms, in byte terms the keys/set elements differ of course): >>> d = {c1: 1, c2: 2} >>> d {'C\u0327': 2, '\xc7': 1} >>> for k, v in d.items(): ... print(k, v) ... ? 2 ? 1 I think this is surprising. >>> s = {c1, c2} >>> s {'C\u0327', '\xc7'} >>> for x in s: ... print(x) ... ? ? And the same result applies to sets of course. I don't know what the performance costs would be for always normalizing strings, but it seems to me that if strings are not normalized, then they are really being treated as byte strings thinly disguised as strings rather than as true sequences of characters whose byte representation is a detail that programmers can ignore (unless they choose to explicitly decode). -- Mark Summerfield, Qtrac Ltd., www.qtrac.eu From guido at python.org Wed Oct 3 05:28:56 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Oct 2007 20:28:56 -0700 Subject: [Python-3000] Are strs sequences of characters or disguised byte strings? In-Reply-To: <200710030324.50588.mark@qtrac.eu> References: <200710030324.50588.mark@qtrac.eu> Message-ID: String objects are arrays of code units. They can represent normalized and unnormalized Unicode text just as easily, and even invalid data, like half a surrogate and other illegal code units. It is up to the application (or perhaps at some point the library) to implement various checks and normalizations. AFAIK this is the same stance that Java and C# take -- the String types there don't concern themselves with the higher levels of Unicode standard compliance. (Though those languages probably have more library support than Python does -- perhaps someone can contribute something, like wrappers for ICU?) However, for identifiers occurring in source code, we *do* normalize before comparing them. PEP 3131 should explain this. --Guido On 10/2/07, Mark Summerfield wrote: > In Python 3.0a1, exec() appears to normalize strings, but in other cases > they don't appear to be normalized, and this leads to results that > appear to be counter-intuitive in some cases, at least to me. > > >>> c1 = "\u00C7" > >>> c2 = "C\u0327" > >>> c3 = "\u0043\u0327" > >>> c1, c2, c3 > ('\xc7', 'C\u0327', 'C\u0327') > >>> print(c1, c2) > ? ? > > Clearly c1 and c2 are different at the byte level. But if we use them to > create variables using exec(), Python appears to normalize them: > > >>> dir() > ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3'] > >>> exec("C\u0327 = 5") > >>> dir() > ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7'] > >>> ? > 5 > >>> exec("\u00C7 = -7") > >>> dir() > ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7'] > >>> ? > -7 > > This seems to be the right behaviour to me, since from the point of view > of a programmer, ? is the name of the variable, no matter what the > underlying byte encoding used to represent the variable's name. > > >>> print(c1, c2) > ? ? > >>> c1.encode("utf8") == c2.encode("utf8") > False > > This is what I'd expect, since here I'm comparing the actual bytes. > > But when I compare them as strings I really expect them to be compared > as sequences of characters (in a human sense), so this: > > >>> c1 == c2 > False > > seems counter-intuitive to me. It is easy to fix: > > >>> from unicodedata import normalize > >>> normalize("NFKD", c1) == normalize("NFKD", c2) > True > > but isn't it asking a lot of Python users to use normalize() whenever > they want to perform such a basic operation as string comparison? > > Another issue that arises is that you can end up with duplicate > dictionary keys and set elements. (The duplication is in human terms, in > byte terms the keys/set elements differ of course): > > >>> d = {c1: 1, c2: 2} > >>> d > {'C\u0327': 2, '\xc7': 1} > >>> for k, v in d.items(): > ... print(k, v) > ... > ? 2 > ? 1 > > I think this is surprising. > > >>> s = {c1, c2} > >>> s > {'C\u0327', '\xc7'} > >>> for x in s: > ... print(x) > ... > ? > ? > > And the same result applies to sets of course. > > I don't know what the performance costs would be for always normalizing > strings, but it seems to me that if strings are not normalized, then > they are really being treated as byte strings thinly disguised as > strings rather than as true sequences of characters whose byte > representation is a detail that programmers can ignore (unless they > choose to explicitly decode). > > -- > Mark Summerfield, Qtrac Ltd., www.qtrac.eu > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Wed Oct 3 19:30:46 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 03 Oct 2007 19:30:46 +0200 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes andMutable Buffer In-Reply-To: References: <4700FC40.1060206@gmail.com> Message-ID: Terry Reedy wrote: > | It'd be useful and more efficient if the new buffer type would support > | the bit wise operations directly: > | > | >>> orig_data &= 0x1F > | TypeError: unsupported operand type(s) for &=: 'bytes' and 'int' > > This sort of broadcast behavior seems like numpy territory to me. Or > better for a buffer subclass. Write it first in Python, using loops like > above (partly for documentation and other implementations), then in C when > interest and usage warrents. The C implementation of the bit wise operations for buffer() gains a large speed improvement over the Python implementation. I'm not sure if Guido would like it and I don't have a use case yet but it sounds like a useful addition to the new buffer() type: buffer &= smallint buffer |= smallint buffer ^= smallint newbuffer = buffer & smallint newbuffer = buffer | smallint newbuffer = buffer ^ smallint I'm willing to give it a try and implement it if people are interested in it. I have an use case for another feature but that's surely out of the scope for the Python core. For some algorithms especially cryptographic algorithms I could use a bytes type which contains larger elements than a char (unsigned int8) and which does overflow (255 + 1 == 0). for b in bytes(b"....", wordsize=32, signed=True): ... Again, it's just a pipe dream and I tend to say that it doesn't belong into the core. > > | >>> orig_data &= b"\x1F" > | TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes' > > Ugh is my response. Stick with the first ;-). Ugh, too :) Christian From guido at python.org Wed Oct 3 19:36:32 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Oct 2007 10:36:32 -0700 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes andMutable Buffer In-Reply-To: References: <4700FC40.1060206@gmail.com> Message-ID: On 10/3/07, Christian Heimes wrote: > I don't have a use case yet but it sounds like a > useful addition to the new buffer() type: That's a contradiction. Without a use case it's not useful. Let's be conservative on these "kitchen sink" ideas. They belong in python-ideas anyway. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Wed Oct 3 20:01:06 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 3 Oct 2007 18:01:06 +0000 (UTC) Subject: [Python-3000] Simplifying pickle for Py3k Message-ID: I guess the library overhaul hasn't really started it but it would be nice if the pickle module could get some work. Today I'm trying to efficiently store a class using pickle and the documentation is making my head hurt. I don't think the documentation itself is the problem, just the fact that the rules are so complicated. I guess there are several different solutions: * Remove backwards compatible stuff from the code and the documentation. The downside is that old pickles could not be loaded. Perhaps that's not a huge issue since the removal of old-style classes might already break old pickles. * Remove the backwards compatible stuff from the documentation only. The would help people using the language but would still be a long term maintenance issue. * Leave the old code in but generate warnings when old pickle mechanisms are used. Eventually the old stuff could be removed from the code. * Provide an "oldpickle" module the supports pre-3k pickles. I think I like the warnings idea best. Neil From guido at python.org Wed Oct 3 20:29:18 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Oct 2007 11:29:18 -0700 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: I think it's essential to be able to *read* pickles generated by older Python versions. But for writing I'm okay with only writing protocol 2 (which Python 2.x also understands) and only supporting the modern APIs for customizing pickle writing. I don't think classic class instances are necessarily unpicklable in 3.0 -- they will just show up as instances of the corresponding new-style classes. --Guido On 10/3/07, Neil Schemenauer wrote: > I guess the library overhaul hasn't really started it but it would > be nice if the pickle module could get some work. Today I'm trying > to efficiently store a class using pickle and the documentation is > making my head hurt. I don't think the documentation itself is the > problem, just the fact that the rules are so complicated. > > I guess there are several different solutions: > > * Remove backwards compatible stuff from the code and the > documentation. The downside is that old pickles could not be > loaded. Perhaps that's not a huge issue since the removal of > old-style classes might already break old pickles. > > * Remove the backwards compatible stuff from the documentation > only. The would help people using the language but would > still be a long term maintenance issue. > > * Leave the old code in but generate warnings when old pickle > mechanisms are used. Eventually the old stuff could be > removed from the code. > > * Provide an "oldpickle" module the supports pre-3k pickles. > > I think I like the warnings idea best. > > Neil > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Wed Oct 3 20:28:48 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 3 Oct 2007 14:28:48 -0400 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: <09EFA1D6-BF99-47A5-8C04-9C481E6DA75D@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 3, 2007, at 2:01 PM, Neil Schemenauer wrote: > I guess the library overhaul hasn't really started it but it would > be nice if the pickle module could get some work. Today I'm trying > to efficiently store a class using pickle and the documentation is > making my head hurt. I don't think the documentation itself is the > problem, just the fact that the rules are so complicated. +1. Try reverse engineering those rules if you really want to have some fun. ;) > I guess there are several different solutions: > > * Remove backwards compatible stuff from the code and the > documentation. The downside is that old pickles could not be > loaded. Perhaps that's not a huge issue since the removal of > old-style classes might already break old pickles. > > * Remove the backwards compatible stuff from the documentation > only. The would help people using the language but would > still be a long term maintenance issue. > > * Leave the old code in but generate warnings when old pickle > mechanisms are used. Eventually the old stuff could be > removed from the code. > > * Provide an "oldpickle" module the supports pre-3k pickles. > > I think I like the warnings idea best. I'm not sure about eventually removing the code, since we may need long term support for migration from 2.x pickles to 3.0 pickles. OTOH, if 2to3 or Python 2.6+ could include pickle migration code, that might be fine. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRwPfYXEjvBPtnXfVAQJfSwQAnoAmgSQy99rJz4C+hks0jvKZz5X3yNOa qV9pV9942KEVZN5lwXLtzoWAnBr9MpXTjZ9AEmDgJVScSXV4Vk/MegsS/Q8R2diG 88x1vpuXQF333CHgWnGiQYw6lysZfP5rbKEHaOYwQB4mjLTS7VSKuZdVtZvvMGH8 7HDj3GqqC0I= =1Plz -----END PGP SIGNATURE----- From g.brandl at gmx.net Wed Oct 3 20:36:19 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 03 Oct 2007 20:36:19 +0200 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: Neil Schemenauer schrieb: > I guess the library overhaul hasn't really started it but it would > be nice if the pickle module could get some work. Today I'm trying > to efficiently store a class using pickle and the documentation is > making my head hurt. I don't think the documentation itself is the > problem, just the fact that the rules are so complicated. > > I guess there are several different solutions: > > * Remove backwards compatible stuff from the code and the > documentation. The downside is that old pickles could not be > loaded. Perhaps that's not a huge issue since the removal of > old-style classes might already break old pickles. > > * Remove the backwards compatible stuff from the documentation > only. The would help people using the language but would > still be a long term maintenance issue. > > * Leave the old code in but generate warnings when old pickle > mechanisms are used. Eventually the old stuff could be > removed from the code. > > * Provide an "oldpickle" module the supports pre-3k pickles. > > I think I like the warnings idea best. I'm in favor of #1, perhaps combined with #4. With the fundamental change in basic types (unicode -> str, str -> bytes) I wouldn't expect 2.x pickles to be loadable by 3.0 anyway. Cruft removal from the pickle protocol is really needed; I don't envy everyone reading the pickle docs trying to understand which method exactly he has to implement, which is going to be called with what arguments, etc. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From skip at pobox.com Wed Oct 3 22:27:34 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 3 Oct 2007 15:27:34 -0500 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: <18179.64310.424753.609880@montanaro.dyndns.org> Georg> I don't envy everyone reading the pickle docs trying to Georg> understand which method exactly he has to implement, which is Georg> going to be called with what arguments, etc. Agreed. I've been going through that (painful) exercise the past couple of days as I try and figure out what methods my to-be-pickled objects need to implement. __reduce__, __reduce_ex__, __getstate__, __setstate__, copy_reg, __safe_for_unpickling__, __getnewargs__. Your head starts to swim after awhile. Skip From lists at cheimes.de Wed Oct 3 23:52:10 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 03 Oct 2007 23:52:10 +0200 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: Neil Schemenauer wrote: > I guess there are several different solutions: > > * Remove backwards compatible stuff from the code and the > documentation. The downside is that old pickles could not be > loaded. Perhaps that's not a huge issue since the removal of > old-style classes might already break old pickles. > > * Remove the backwards compatible stuff from the documentation > only. The would help people using the language but would > still be a long term maintenance issue. > > * Leave the old code in but generate warnings when old pickle > mechanisms are used. Eventually the old stuff could be > removed from the code. > > * Provide an "oldpickle" module the supports pre-3k pickles. > > I think I like the warnings idea best. Please keep in mind that we want people to move to Python 3.x. Pickles are very important for a bunch of well known and large Python applications like Zope2, Zope3, Mailman and probably many more. Zope's ZODB makes heavy use of pickles. If you remove the support for old style pickles from Python 2.x you also remove the migration path for a large user base to Python 3.x. I like to propose option (4b): Provide an oldpickle module which can load old pickles and migrate an old pickle to a Python 3.x pickle. As long as Python 3.0 can load and migrate old to new pickles I'm also for option (1). The pickle module could use an emaciation. Christian From greg.ewing at canterbury.ac.nz Thu Oct 4 00:24:14 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 04 Oct 2007 10:24:14 +1200 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: <18179.64310.424753.609880@montanaro.dyndns.org> References: <18179.64310.424753.609880@montanaro.dyndns.org> Message-ID: <4704168E.3090005@canterbury.ac.nz> skip at pobox.com wrote: > I've been going through that (painful) exercise the past couple of > days as I try and figure out what methods my to-be-pickled objects need to > implement. __reduce__, __reduce_ex__, __getstate__, __setstate__, copy_reg, > __safe_for_unpickling__, __getnewargs__. Your head starts to swim after > awhile. Not all of these are old cruft -- some of them are alternatives that are useful in one situation or another. Some of them could no doubt be removed, though. -- Greg From alexandre at peadrop.com Thu Oct 4 08:49:16 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Thu, 4 Oct 2007 02:49:16 -0400 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: On 10/3/07, Neil Schemenauer wrote: > I guess the library overhaul hasn't really started it but it would > be nice if the pickle module could get some work. Today I'm trying > to efficiently store a class using pickle Could you elaborate on what you are trying to do? > and the documentation is making my head hurt. I don't think the > documentation itself is the problem, just the fact that the rules > are so complicated. > > I guess there are several different solutions: > > * Remove backwards compatible stuff from the code and the > documentation. The downside is that old pickles could not be > loaded. Perhaps that's not a huge issue since the removal of > old-style classes might already break old pickles. > This would not simplify the pickle module by much. So, I don't think this would justify breaking backward-compatibility. As far as I know, the removal of the old-style classes does not break old pickle streams, since the code of classes is not pickled but referenced. > * Remove the backwards compatible stuff from the documentation > only. The would help people using the language but would > still be a long term maintenance issue. The documentation for the pickle module is completely outdated and confusing. In fact, some sections are outright wrong about how the current module works. If I get some free time (which is unlikely, right now), I will update the documentation. > * Leave the old code in but generate warnings when old pickle > mechanisms are used. Eventually the old stuff could be > removed from the code. Could point out specific examples of the "old code" that you are referring to? > * Provide an "oldpickle" module the supports pre-3k pickles. As I said, old pickle streams should work fine with Py3k. So, adding yet another pickle module is unnecessary. -- Alexandre From nas at arctrix.com Fri Oct 5 06:59:30 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 4 Oct 2007 22:59:30 -0600 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: <20071005045930.GA20564@arctrix.com> On Thu, Oct 04, 2007 at 02:49:16AM -0400, Alexandre Vassalotti wrote: > Could you elaborate on what you are trying to do? I'm trying to efficiently pickle a 'unicode' subclass. I'm disappointed that it's not possible to be as efficient as the built-in unicode class, even when using an extension code. > The documentation for the pickle module is completely outdated and > confusing. In fact, some sections are outright wrong about how the > current module works. If I get some free time (which is unlikely, > right now), I will update the documentation. Yes, I've changed my mind and agree. PEP 307 provides a lot of details that library docs do not but it's not written as a reference doc. Improved library docs would help a lot. > > * Leave the old code in but generate warnings when old pickle > > mechanisms are used. Eventually the old stuff could be > > removed from the code. > > Could point out specific examples of the "old code" that you are referring to? I don't have time right now to point at specific code. How about the code that implements all the different versions of __reduce__ and code for __getinitargs__, __getstate__, __setstate__? In any case, it looks like there will be volunteers to maintain the backwards compatability of the pickle module. That's great. Neil From mark at qtrac.eu Fri Oct 5 09:20:39 2007 From: mark at qtrac.eu (Mark Summerfield) Date: Fri, 5 Oct 2007 08:20:39 +0100 Subject: [Python-3000] Small renaming suggestion: re.sub() -> re.replace() or re.substitute() Message-ID: <200710050820.39238.mark@qtrac.eu> Hi, It seems to me that one of the few really "bad" method names in the Python library that I regularly encounter is re.sub(). I don't like the name because: (1) It is an abbreviation, but not an "obvious" one like max and min (2) It is an ambiguous name: could be substitute or could be subtract (3) Elsewhere where special method __foo__ that implements a named (as opposed to symbol-based) method, that method is called foo. For example, __cmp__() -> cmp(), __int__() -> int(), __len__() -> len(). But __add__ -> +, __sub__() -> -. (4) It is the only function with this name in the library; whereas there are several replace methods: bytes.replace() str.replace() datetime.date.replace() # and a few others, plus some replace_* functions. Although re.substitute() would work (and be better than sub), I think re.replace() is better and more consistent regarding the rest of the library. And as for subn, well, replacen or substituten are possible, but why not have just one method and have an optional keyword argument if a tuple is wanted? -- Mark Summerfield, Qtrac Ltd., www.qtrac.eu From facundobatista at gmail.com Fri Oct 5 12:45:54 2007 From: facundobatista at gmail.com (Facundo Batista) Date: Fri, 5 Oct 2007 07:45:54 -0300 Subject: [Python-3000] Small renaming suggestion: re.sub() -> re.replace() or re.substitute() In-Reply-To: <200710050820.39238.mark@qtrac.eu> References: <200710050820.39238.mark@qtrac.eu> Message-ID: 2007/10/5, Mark Summerfield : > Although re.substitute() would work (and be better than sub), I think > re.replace() is better and more consistent regarding the rest of the > library. +1, happened twice to me, different jobs, that a colleague came to me asking why there was no "replace" in "re". Yes, sub() is even difficult to find (unless you *read* all the descriptions of the methods). Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From alexandre at peadrop.com Sat Oct 6 06:35:39 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sat, 6 Oct 2007 00:35:39 -0400 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: <20071005045930.GA20564@arctrix.com> References: <20071005045930.GA20564@arctrix.com> Message-ID: On 10/5/07, Neil Schemenauer wrote: > On Thu, Oct 04, 2007 at 02:49:16AM -0400, Alexandre Vassalotti wrote: > > Could you elaborate on what you are trying to do? > > I'm trying to efficiently pickle a 'unicode' subclass. I'm > disappointed that it's not possible to be as efficient as the > built-in unicode class, even when using an extension code. There is a few things you could do to produce smaller pickle streams. If you are certain that the objects you will pickle are not self-referential, then you can set Pickler.fast to True. This will disable the "memorizer", which adds a 2-bytes overhead to each objects pickled (depending on the input, this might or not shorten the resulting stream). If this isn't enough, then you could subclass Pickler and Unpickler and define a custom rule for your unicode subclass. An obvious optimization for pickle, in Py3k, would to add support for short unicode string. Currently, there is a 4-bytes overhead per string. Since Py3k is unicode throughout, this overhead can become quite large. > > Could point out specific examples of the "old code" that you are referring to? > > I don't have time right now to point at specific code. How about > the code that implements all the different versions of __reduce__ > and code for __getinitargs__, __getstate__, __setstate__? At first glance, __reduce__ seems to be useful only for instances of subclasses of built-in type. However, __getnewsargs__ could easily replace it for that. So, removing __reduce__ (and __reduce_ex__) is probably a good idea. As far as I know, the current pickle module doesn't use __getinitargs__ (this is one of the things the documentation is totally wrong about). As for __getstate__ and __setstate__, I think they are essential. Without them, you couldn't pickle objects with __slots__ or save the I/O state of certain objects. It would certainly be possible to simplify a little the algorithm used for pickling class instances. In "pseudo-code", it would look like something along these lines: def save_obj(obj): # let obj be the instance of a user-defined class cls = obj.__class__ if hasattr(obj, "__getnewargs__"): args = obj.__getnewargs__() else: args = () if hasattr(obj, "__getstate__"): state = obj.__getstate__() else: state = obj.__dict__ return (cls, args, state) def load_obj(cls, args, state): obj = cls.__new__(cls, *args) if hasattr(obj, "__getstate__"): try: obj.__setstate__(state) except AttributeError: raise UnpicklingError else: obj.__dict__.update(state) return obj The main difference, between this and current method used to pickle instances, is the use of __getnewargs__, instead of __reduce__. -- Alexandre From guido at python.org Mon Oct 8 06:32:59 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Oct 2007 21:32:59 -0700 Subject: [Python-3000] PEP 3137 plan of attack Message-ID: I'd like to make complete implementation of PEP 3137 the goal for the 3.0a2 release. It should be doable to do this release by the end of October. I don't think anything else *needs* to be done to have a successful a2 release. The work for PEP 3137 can be split into a number of relatively independent steps. In some cases these can even be carried out in either order. I'd love to see volunteers for each of these steps. Note: I'll refer to the three string types by their C names, as I plan to keep those unchanged in 3.0a2. We can rename them later, but renaming them will make merging from the trunk and converting 3rd party extensions harder. The C names are PyString (immutable bytes), PyBytes (mutable bytes), PyUnicode (immutable unicode code units, either 16 bits or 32 bits). The tasks I can think of are: - remove locale support from PyString - remove compatibility with PyUnicode from PyString - remove compatibility with PyString from PyUnicode - add missing methods to PyBytes (for list, see the PEP and compare to what's already there) - remove buffer API from PyUnicode - make == and != between PyBytes and PyUnicode return False instead of raising TypeError - make == and != between PyString and Pyunicode return False instead of converting - make comparisons between PyString and PyBytes work (these are properly ordered) - change lots of places (e.g. encoders) to return PyString instead of PyBytes - change indexing and iteration over PyString to return ints, not 1-char PyStrings - change PyString's repr() to return "b'...'" - change PyBytes's repr() to return "buffer(b'...')" - change parser so that b"..." returns PyString, not PyBytes - rename bytes -> buffer, str8 -> bytes If a task is done independently from the others, it should include changes to keep the unit tests working. If you volunteer, please send out an email to this list before you start doing any work, to avoid duplicate work (unless sending the email would take more time than it would take to write the code, compile it, run all unit tests, and upload the patch). I'd appreciate it if you gave an estimate for when you expect to be done (or give up) too. For code submissions, please use bugs.python.org and send an email pointing to the relevant issue to this list. PS. Is there anyone who understands test_urllib2net and can fix it? It's been failing for weeks (maybe months) now. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tom at vector-seven.com Mon Oct 8 07:03:37 2007 From: tom at vector-seven.com (Thomas Lee) Date: Mon, 08 Oct 2007 15:03:37 +1000 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <4709BA29.3060503@vector-seven.com> Guido van Rossum wrote: > - make == and != between PyBytes and PyUnicode return False instead of > raising TypeError > - make == and != between PyString and Pyunicode return False instead > of converting > - make comparisons between PyString and PyBytes work (these are > properly ordered) > If nobody else is doing this, it sounds like sounds like something I - as a relative newbie - could handle. Possibly the repr() stuff too if nobody else wants that. Should be able to get a patch up before Friday. Cheers, Tom From lists at cheimes.de Mon Oct 8 13:17:55 2007 From: lists at cheimes.de (Christian Heimes) Date: Mon, 08 Oct 2007 13:17:55 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: Guido van Rossum wrote: > - change PyString's repr() to return "b'...'" > - change PyBytes's repr() to return "buffer(b'...')" > - change parser so that b"..." returns PyString, not PyBytes I'll take the three steps. They sound like low hanging fruits even for a noob like me. I expect to have a working patch in the new couple of days. Christian From greg at krypto.org Mon Oct 8 18:32:31 2007 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 8 Oct 2007 09:32:31 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> > - add missing methods to PyBytes (for list, see the PEP and compare to > what's already there) > - remove buffer API from PyUnicode I'll take these two with a goal of having them done by the end of the week. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071008/3e1e9a51/attachment.htm From janssen at parc.com Mon Oct 8 19:51:23 2007 From: janssen at parc.com (Bill Janssen) Date: Mon, 8 Oct 2007 10:51:23 PDT Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <07Oct8.105132pdt."57996"@synergy1.parc.xerox.com> I think I can spend some time on the 3K SSL support, but I've been waiting till the "bytes" work settles down. Sounds like I should keep waiting a bit more? Or have the C APIs already settled? Bill From guido at python.org Mon Oct 8 20:42:09 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2007 11:42:09 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <7125419022533265919@unknownmsgid> References: <7125419022533265919@unknownmsgid> Message-ID: On 10/8/07, Bill Janssen wrote: > I think I can spend some time on the 3K SSL support, but I've been > waiting till the "bytes" work settles down. Sounds like I should > keep waiting a bit more? Or have the C APIs already settled? The C APIs haven't quite settled down yet, but I'd like to convince you that you needn't wait. For all bytes input, you should use the (new) buffer API,i. e. PyObject_GetBuffer() and PyObject_ReleaseBuffer() (grep for usage examples if they aren't sufficiently documented in the docs or in PEP 3118). For stuff that returns bytes, you can either use PyBytes_FromStringAndSize() -- which is the 3.0a1 recommended best practice (returning a mutable bytes object) or PyString_FromStringAndSize() -- which will be the 3.0a2 way of returning an immutable bytes object). Since they have the same signature there's very little to worry about having to change this around later. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Mon Oct 8 21:50:02 2007 From: brett at python.org (Brett Cannon) Date: Mon, 8 Oct 2007 12:50:02 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/7/07, Guido van Rossum wrote: [SNIP] > PS. Is there anyone who understands test_urllib2net and can fix it? > It's been failing for weeks (maybe months) now. I don't understand it but I fixed it in r58378. =) When ftplib.FTP was converted over to Py3K it was given a default encoding of ASCII on all read data, but that doesn't work as the stuff on the other end could be latin1 (and it was). So I just changed the default encoding. -Brett From guido at python.org Mon Oct 8 21:51:59 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2007 12:51:59 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Brett Cannon wrote: > On 10/7/07, Guido van Rossum wrote: > [SNIP] > > PS. Is there anyone who understands test_urllib2net and can fix it? > > It's been failing for weeks (maybe months) now. > > I don't understand it but I fixed it in r58378. =) > > When ftplib.FTP was converted over to Py3K it was given a default > encoding of ASCII on all read data, but that doesn't work as the stuff > on the other end could be latin1 (and it was). So I just changed the > default encoding. Cool. Though how do you know it was really latin1? Is there anything standardized about the encoding used by FTP? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Oct 8 22:03:35 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2007 13:03:35 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/7/07, Guido van Rossum wrote: > - remove locale support from PyString > - remove compatibility with PyUnicode from PyString > - remove compatibility with PyString from PyUnicode I'll tackle these myself by Friday, unless someone else beats me to it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Mon Oct 8 22:05:31 2007 From: brett at python.org (Brett Cannon) Date: Mon, 8 Oct 2007 13:05:31 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Guido van Rossum wrote: > On 10/8/07, Brett Cannon wrote: > > On 10/7/07, Guido van Rossum wrote: > > [SNIP] > > > PS. Is there anyone who understands test_urllib2net and can fix it? > > > It's been failing for weeks (maybe months) now. > > > > I don't understand it but I fixed it in r58378. =) > > > > When ftplib.FTP was converted over to Py3K it was given a default > > encoding of ASCII on all read data, but that doesn't work as the stuff > > on the other end could be latin1 (and it was). So I just changed the > > default encoding. > > Cool. Though how do you know it was really latin1? Is there anything > standardized about the encoding used by FTP? See, now I had to go and look stuff up. So much work for a holiday. =) According to the spec, data transfers can be anything based on data transfer format specified. ASCII is one of them, but so is Local which can be anything. Turns out that ftplib.FTP.connect() reads from the socket using socket.makefile('r', encoding), so it starts off in text mode. So that makes restricting the encoding to bytes < 128 a bad thing as not all possible data transfers would be legal. Basically it sounds like the ftplib module might need a thorough rewrite to use bytes/buffers so that the proper decoding happens at the last second. But I am not the person to do that rewrite. =) -Brett From guido at python.org Mon Oct 8 22:08:01 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2007 13:08:01 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Brett Cannon wrote: > On 10/8/07, Guido van Rossum wrote: > > On 10/8/07, Brett Cannon wrote: > > > On 10/7/07, Guido van Rossum wrote: > > > [SNIP] > > > > PS. Is there anyone who understands test_urllib2net and can fix it? > > > > It's been failing for weeks (maybe months) now. > > > > > > I don't understand it but I fixed it in r58378. =) > > > > > > When ftplib.FTP was converted over to Py3K it was given a default > > > encoding of ASCII on all read data, but that doesn't work as the stuff > > > on the other end could be latin1 (and it was). So I just changed the > > > default encoding. > > > > Cool. Though how do you know it was really latin1? Is there anything > > standardized about the encoding used by FTP? > > See, now I had to go and look stuff up. So much work for a holiday. =) > > According to the spec, data transfers can be anything based on data > transfer format specified. ASCII is one of them, but so is Local > which can be anything. > > Turns out that ftplib.FTP.connect() reads from the socket using > socket.makefile('r', encoding), so it starts off in text mode. So > that makes restricting the encoding to bytes < 128 a bad thing as not > all possible data transfers would be legal. > > Basically it sounds like the ftplib module might need a thorough > rewrite to use bytes/buffers so that the proper decoding happens at > the last second. But I am not the person to do that rewrite. =) Thanks. Mind filing a bug for someone to find? It sounds like the rewrite might be easier once we have immutable bytes. (So this conversation is not entirely off-topic for this thread. ;-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Mon Oct 8 22:12:22 2007 From: brett at python.org (Brett Cannon) Date: Mon, 8 Oct 2007 13:12:22 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Guido van Rossum wrote: > On 10/8/07, Brett Cannon wrote: > > On 10/8/07, Guido van Rossum wrote: > > > On 10/8/07, Brett Cannon wrote: > > > > On 10/7/07, Guido van Rossum wrote: > > > > [SNIP] > > > > > PS. Is there anyone who understands test_urllib2net and can fix it? > > > > > It's been failing for weeks (maybe months) now. > > > > > > > > I don't understand it but I fixed it in r58378. =) > > > > > > > > When ftplib.FTP was converted over to Py3K it was given a default > > > > encoding of ASCII on all read data, but that doesn't work as the stuff > > > > on the other end could be latin1 (and it was). So I just changed the > > > > default encoding. > > > > > > Cool. Though how do you know it was really latin1? Is there anything > > > standardized about the encoding used by FTP? > > > > See, now I had to go and look stuff up. So much work for a holiday. =) > > > > According to the spec, data transfers can be anything based on data > > transfer format specified. ASCII is one of them, but so is Local > > which can be anything. > > > > Turns out that ftplib.FTP.connect() reads from the socket using > > socket.makefile('r', encoding), so it starts off in text mode. So > > that makes restricting the encoding to bytes < 128 a bad thing as not > > all possible data transfers would be legal. > > > > Basically it sounds like the ftplib module might need a thorough > > rewrite to use bytes/buffers so that the proper decoding happens at > > the last second. But I am not the person to do that rewrite. =) > > Thanks. Mind filing a bug for someone to find? It sounds like the > rewrite might be easier once we have immutable bytes. (So this > conversation is not entirely off-topic for this thread. ;-) Created issue1248. -Brett From nnorwitz at gmail.com Mon Oct 8 22:13:29 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Mon, 8 Oct 2007 13:13:29 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Guido van Rossum wrote: > On 10/7/07, Guido van Rossum wrote: > > - remove locale support from PyString > > - remove compatibility with PyUnicode from PyString > > - remove compatibility with PyString from PyUnicode > > I'll tackle these myself by Friday, unless someone else beats me to it. I experimented a bit with removing some of the delegation to PyUnicode in stringobject.c. I ran into many problems starting the interpreter or printing things out (fatal errors or exceptions). It seems we still are using str8 in a bunch of places that need to converted to Unicode. I think that will make it easier to rip out the dependencies. If I have time, I'll probably focus on converting more uses of PyString to PyUnicode. These need to be done anyways and will probably make other changes easier. n From phd at phd.pp.ru Mon Oct 8 22:00:15 2007 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 9 Oct 2007 00:00:15 +0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <20071008200015.GA3316@phd.pp.ru> On Mon, Oct 08, 2007 at 12:51:59PM -0700, Guido van Rossum wrote: > Cool. Though how do you know it was really latin1? Is there anything > standardized about the encoding used by FTP? There is no. Russian users, e.g., use all encodings - koi8-r, cp1251, utf-8; cp1251 is the most popular here, of course. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From alexandre at peadrop.com Tue Oct 9 00:05:36 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 8 Oct 2007 18:05:36 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Guido van Rossum wrote: > - remove buffer API from PyUnicode > - change PyString's repr() to return "b'...'" > - change PyBytes's repr() to return "buffer(b'...')" I got patches for these. I plan to submit them for review after doing more testing to make sure they work right. -- Alexandre From guido at python.org Tue Oct 9 00:36:05 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2007 15:36:05 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: Cool. Just notice that you haven't been following protocol -- Christian Heimes volunteered to do these too. :-) On 10/8/07, Alexandre Vassalotti wrote: > On 10/8/07, Guido van Rossum wrote: > > - remove buffer API from PyUnicode > > - change PyString's repr() to return "b'...'" > > - change PyBytes's repr() to return "buffer(b'...')" > > I got patches for these. I plan to submit them for review after doing > more testing to make sure they work right. > > > -- Alexandre > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Tue Oct 9 00:45:17 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 8 Oct 2007 18:45:17 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Guido van Rossum wrote: > Cool. Just notice that you haven't been following protocol -- > Christian Heimes volunteered to do these too. :-) Oops, sorry Christian for taking yours. -- Alexandre From brett at python.org Tue Oct 9 01:19:34 2007 From: brett at python.org (Brett Cannon) Date: Mon, 8 Oct 2007 16:19:34 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Alexandre Vassalotti wrote: > On 10/8/07, Guido van Rossum wrote: > > Cool. Just notice that you haven't been following protocol -- > > Christian Heimes volunteered to do these too. :-) > > Oops, sorry Christian for taking yours. See http://bugs.python.org/issue1247 for Christian's patch. Maybe you can do a code review of Christian's work, Alexandre? And if you want to be really brave you could maybe even do the commit yourself. =) -Brett From alexandre at peadrop.com Tue Oct 9 00:56:41 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 8 Oct 2007 18:56:41 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Guido van Rossum wrote: > - change indexing and iteration over PyString to return ints, not > 1-char PyStrings I will try do this one. -- Alexandre From lists at cheimes.de Tue Oct 9 00:57:49 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 09 Oct 2007 00:57:49 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: Alexandre Vassalotti wrote: > On 10/8/07, Guido van Rossum wrote: >> Cool. Just notice that you haven't been following protocol -- >> Christian Heimes volunteered to do these too. :-) > > Oops, sorry Christian for taking yours. I've submitted my patch a few hours ago. I wasn't able to test it to full extend because the svn server was down and I couldn't get the latest update. I noticed that PyBytes doesn't have an iteration view like PyString. Do we need a view for it? Christian From lists at cheimes.de Tue Oct 9 01:29:31 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 09 Oct 2007 01:29:31 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <470ABD5B.3060601@cheimes.de> Brett Cannon wrote: > See http://bugs.python.org/issue1247 for Christian's patch. Maybe you > can do a code review of Christian's work, Alexandre? And if you want > to be really brave you could maybe even do the commit yourself. =) I'm not happy with: static const char *quote_prefix = "buffer(b'"; p = PyUnicode_AS_UNICODE(v); for (i=0; i References: Message-ID: On 10/8/07, Christian Heimes wrote: > Alexandre Vassalotti wrote: > > On 10/8/07, Guido van Rossum wrote: > >> Cool. Just notice that you haven't been following protocol -- > >> Christian Heimes volunteered to do these too. :-) > > > > Oops, sorry Christian for taking yours. > > I've submitted my patch a few hours ago. I wasn't able to test it to > full extend because the svn server was down and I couldn't get the > latest update. Now we'll have competing patches. Can you two please review each other's so I won't have to review two? Anyway, anonymous svn should be working again. > I noticed that PyBytes doesn't have an iteration view like PyString. Do > we need a view for it? Yes, that would be a good idea! This currently causes a bit of a problem for the Sequence ABC. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Tue Oct 9 01:55:13 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 8 Oct 2007 19:55:13 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470ABD5B.3060601@cheimes.de> References: <470ABD5B.3060601@cheimes.de> Message-ID: Ah! In my review, I was going to suggest you that: while (*quote_prefix) *p++ = *quote_prefix++; -- Alexandre On 10/8/07, Christian Heimes wrote: > I'm not happy with: > > static const char *quote_prefix = "buffer(b'"; > p = PyUnicode_AS_UNICODE(v); > for (i=0; i *p++ = quote_prefix[i]; > } > > but I didn't know how to code it more elegant. It follows the previous > version of the code and it's the fastest way I can think of without From qrczak at knm.org.pl Tue Oct 9 02:02:26 2007 From: qrczak at knm.org.pl (Marcin =?UTF-8?Q?=E2=80=98Qrczak=E2=80=99?= Kowalczyk) Date: Tue, 09 Oct 2007 02:02:26 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470ABD5B.3060601@cheimes.de> References: <470ABD5B.3060601@cheimes.de> Message-ID: <1191888146.15402.5.camel@qrnik> Dnia 09-10-2007, Wt o godzinie 01:29 +0200, Christian Heimes pisze: > I'm not happy with: > > static const char *quote_prefix = "buffer(b'"; > p = PyUnicode_AS_UNICODE(v); > for (i=0; i *p++ = quote_prefix[i]; > } strlen in a loop is bad for performance. I would do: static const Py_UNICODE quote_prefix[] = { 'b', 'u', 'f', 'f', 'e', 'r', '(', 'b', '\'' }; and memcpy. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From tom at vector-seven.com Tue Oct 9 15:31:16 2007 From: tom at vector-seven.com (Thomas Lee) Date: Tue, 09 Oct 2007 23:31:16 +1000 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <4709BA29.3060503@vector-seven.com> References: <4709BA29.3060503@vector-seven.com> Message-ID: <470B82A4.8030703@vector-seven.com> Thomas Lee wrote: > Guido van Rossum wrote: > >> - make == and != between PyBytes and PyUnicode return False instead of >> raising TypeError >> A patch for this is ready. I'll submit it to the bug tracker later tonight. >> - make == and != between PyString and Pyunicode return False instead >> of converting >> This will be trivial, but I need to ask a stupid question: is this also true for PyUnicode_Compare? (i.e. should PyUnicode_Compare(str8(), str()) != 0 ?) And, if so, what should PyUnicode_Compare actually return if one of the parameters is a PyString? Maybe -1 for PyUnicode on the left, 1 for PyUnicode on the right? >> - make comparisons between PyString and PyBytes work (these are >> properly ordered) >> >> Is it just me, or do string/bytes comparisons already work? >>> s = str8('test') >>> b = b'test' >>> s == b True >>> b == s True >>> s != b False >>> b != s False Cheers, Tom From tom at vector-seven.com Tue Oct 9 15:39:59 2007 From: tom at vector-seven.com (Thomas Lee) Date: Tue, 09 Oct 2007 23:39:59 +1000 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470B82A4.8030703@vector-seven.com> References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> Message-ID: <470B84AF.4060704@vector-seven.com> Thomas Lee wrote: > Thomas Lee wrote: > >> Guido van Rossum wrote: >> >> >>> - make == and != between PyBytes and PyUnicode return False instead of >>> raising TypeError >>> >>> > A patch for this is ready. I'll submit it to the bug tracker later tonight. > This patch is now up: http://bugs.python.org/issue1249 Cheers, Tom From guido at python.org Tue Oct 9 17:01:02 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2007 08:01:02 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470B82A4.8030703@vector-seven.com> References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> Message-ID: On 10/9/07, Thomas Lee wrote: > Thomas Lee wrote: > > Guido van Rossum wrote: > > > >> - make == and != between PyBytes and PyUnicode return False instead of > >> raising TypeError > >> > A patch for this is ready. I'll submit it to the bug tracker later tonight. > >> - make == and != between PyString and Pyunicode return False instead > >> of converting > >> > This will be trivial, but I need to ask a stupid question: is this also > true for PyUnicode_Compare? (i.e. should PyUnicode_Compare(str8(), > str()) != 0 ?) > > And, if so, what should PyUnicode_Compare actually return if one of the > parameters is a PyString? Maybe -1 for PyUnicode on the left, 1 for > PyUnicode on the right? Assuming that PyUnicode_Compare is a three-way comparison (less, equal, more), it should raise a TypeError when one of the arguments is a PyString or PyBytes. > >> - make comparisons between PyString and PyBytes work (these are > >> properly ordered) > >> > >> > Is it just me, or do string/bytes comparisons already work? > > >>> s = str8('test') > >>> b = b'test' > >>> s == b > True > >>> b == s > True > >>> s != b > False > >>> b != s > False Seems it's already so. Do they order properly too? (< <= > >=) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Oct 9 17:56:50 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2007 08:56:50 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470BA418.5060301@vector-seven.com> References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> <470BA418.5060301@vector-seven.com> Message-ID: On 10/9/07, Thomas Lee wrote: > Guido van Rossum wrote: > >>> > >>>> - make == and != between PyBytes and PyUnicode return False instead of > >>>> raising TypeError > >>>> > >>>> > Just thinking about it I'm pretty sure my initial patch is wrong - > forgive my ignorance. To remove the ambiguity, is it fair to state the > following? > > bytes() == str() -> False instead of raising TypeError > bytes() != str() -> True instead of raising TypeError Correct. > I initially read that as "return False whenever any comparison between > bytes and unicode objects is attempted" ... The point is that a bytes and a str instance are never considered equal... > > Assuming that PyUnicode_Compare is a three-way comparison (less, > > equal, more), it should raise a TypeError when one of the arguments is > > a PyString or PyBytes. > > > > > Cool. Should have that sorted out soon. As above: > > str8() == str() -> False > str8() != str() -> True > > Correct? Well, in this case you actually have to compare the individual bytes. But yes. ;-) > >> Is it just me, or do string/bytes comparisons already work? > >> > >> >>> s = str8('test') > >> >>> b = b'test' > >> >>> s == b > >> True > >> >>> b == s > >> True > >> >>> s != b > >> False > >> >>> b != s > >> False > >> > > > > Seems it's already so. Do they order properly too? (< <= > >=) > > > Looks like it: > > >>> str8('a') > b'b' > False > >>> str8('a') < b'b' > True > >>> str8('a') <= b'b' > True > >>> str8('a') >= b'b' > False Well that part was easy then. ;-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Oct 9 19:02:03 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2007 10:02:03 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470BAA01.9090202@vector-seven.com> References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> <470BA418.5060301@vector-seven.com> <470BAA01.9090202@vector-seven.com> Message-ID: On 10/9/07, Thomas Lee wrote: > Guido van Rossum wrote: > > > > The point is that a bytes and a str instance are never considered equal... > > > > > Sorry. I understand now. My brain must have been on a holiday earlier. > :) Just pushed an updated patch to the bug tracker. > >> str8() == str() -> False > >> str8() != str() -> True > >> > >> Correct? > >> > > > > Well, in this case you actually have to compare the individual bytes. > > But yes. ;-) > > > I'm confused: if I'm making == and != between PyString return False > instead of converting, at what point would I need to be comparing bytes? > > The fix I have ready for this merely wipes out the conversion from > PyString to PyUnicode in PyUnicode_Compare and the existing code takes > care of the rest. Is this all that's required, or have I misinterpreted > this one too? :) Sorry, my bad. I misread and though you were talking about PyString vs. PyBytes. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Oct 9 19:24:33 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2007 10:24:33 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470BA418.5060301@vector-seven.com> References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> <470BA418.5060301@vector-seven.com> Message-ID: On 10/9/07, Thomas Lee wrote: > Looks like it: > > >>> str8('a') > b'b' > False > >>> str8('a') < b'b' > True > >>> str8('a') <= b'b' > True > >>> str8('a') >= b'b' > False Which reminds me of a task I forgot to add to the list: - change the constructor for PyString to match the one for PyBytes. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Oct 10 00:33:20 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2007 15:33:20 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> <470BA418.5060301@vector-seven.com> Message-ID: On 10/9/07, Guido van Rossum wrote: > Which reminds me of a task I forgot to add to the list: > > - change the constructor for PyString to match the one for PyBytes. And another pair of forgotten tasks: - change PyBytes so that its str() is the same as its repr(). - change PyString so that its str() is the same as its repr(). The former seems easy. The latter might cause trouble (though then again, it may not). I should also note that I already submitted the changes to remove locale support from PyString, and am working on removing its encode() method. This is not going so smoothly. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Wed Oct 10 05:27:43 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 9 Oct 2007 23:27:43 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Alexandre Vassalotti wrote: > On 10/8/07, Guido van Rossum wrote: > > - change indexing and iteration over PyString to return ints, not > > 1-char PyStrings > > I will try do this one. This took a bit longer than I expected. Changing the PyString iterator to return ints was easy, but I ran into some issues with the codec registry. I won't have the time this week to work on my patch any further. Meanwhile if someone would like to improve it, feel free to do so (the patch is attached to this email). Otherwise, I will continue to work on it next weekend. Cheers, -- Alexandre -------------- next part -------------- A non-text attachment was scrubbed... Name: string_iter_ret_ints.patch Type: text/x-diff Size: 4742 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20071009/7053a418/attachment-0001.patch From greg at krypto.org Wed Oct 10 07:49:00 2007 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 9 Oct 2007 22:49:00 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> Message-ID: <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> > > > - remove buffer API from PyUnicode > > > I'll take these two with a goal of having them done by the end of the > week. > > -gps > I should've known not to believe the simple description. This one is proving difficult by itself. If I modify the Unicode object to not support the buffer API I can't even launch the python interpreter. Any one with more time on their hands want this one? I'll still deal with adding the missing PyBytes methods. -g -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071009/2da87f85/attachment.htm From jyasskin at gmail.com Wed Oct 10 08:02:19 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Wed, 10 Oct 2007 01:02:19 -0500 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> Message-ID: <5d44f72f0710092302s52be427fp19bfbae07a8d2700@mail.gmail.com> On 10/10/07, Gregory P. Smith wrote: > > > > > > > > > > > > - remove buffer API from PyUnicode > > > > > > I'll take these two with a goal of having them done by the end of the > week. > > > > -gps > > I should've known not to believe the simple description. This one is > proving difficult by itself. If I modify the Unicode object to not support > the buffer API I can't even launch the python interpreter. Any one with > more time on their hands want this one? > > I'll still deal with adding the missing PyBytes methods. I've got two plane flights coming up, so I can tackle removing the buffer API from PyUnicode (and perhaps removing the PyBUF_CHARACTER constant entirely if it's on the way). I'll hope to be done by Monday, with a status report of some sort by Friday. From alexandre at peadrop.com Wed Oct 10 15:27:09 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Wed, 10 Oct 2007 09:27:09 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> Message-ID: On 10/10/07, Gregory P. Smith wrote: > > > - remove buffer API from PyUnicode > > > > I'll take these two with a goal of having them done by the end of the > week. > > > > I should've known not to believe the simple description. This one is > proving difficult by itself. If I modify the Unicode object to not support > the buffer API I can't even launch the python interpreter. Any one with > more time on their hands want this one? > I have a patch for this one. I just haven't tested it throughly. I attached the patch, so free to improve it. -- Alexandre -------------- next part -------------- A non-text attachment was scrubbed... Name: unicode_rm_buf_api.patch Type: text/x-diff Size: 1889 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20071010/e1ffa412/attachment.patch From lists at cheimes.de Wed Oct 10 20:01:21 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 10 Oct 2007 20:01:21 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <470D1371.3020309@cheimes.de> Guido van Rossum wrote: > > The tasks I can think of are: [...] (Resend, the first mail didn't make it and I forgot a point) While I was working on a patch for the renaming of bytes and str8 I found some open issues that need to be discussed and addressed: - Create an iterator view for PyBytes. The buffer object doesn't have a view for iteration like bytes have with PyStringIter_Type. Guido said he wants a view to play nice with the Sequence ABC. - Should bytes (PyString_Type) subclass from basestring? It doesn't feel quite right to me. I think we could remove basestring completely if bytes doesn't subclass from it. - Do we need a common base type for bytes and buffer like e.g. basebytes? - The new bytes type (formally known as str8 / PyString_Type) still has a bunch of methods from its original Python 2.x parent: ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'] Should any of these methods be removed? - PyString still excepts unicode in a lot of places and some important parts of Python still require it. The interpreter was f... up as I removed unicode support from functions like PyString_Size and PyString_AsString. I'm not sure which function is causing trouble. The error message was an exception bootstrapping error because PyImport_ImportModule("__builtin__") failed. Should these methods still accept unicode and convert it with the default encoding? Christian From guido at python.org Wed Oct 10 20:08:20 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Oct 2007 11:08:20 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470D1371.3020309@cheimes.de> References: <470D1371.3020309@cheimes.de> Message-ID: On 10/10/07, Christian Heimes wrote: > Guido van Rossum wrote: > > > The tasks I can think of are: > [...] > > (Resend, the first mail didn't make it and I forgot a point) > > While I was working on a patch for the renaming of bytes and str8 I > found some open issues that need to be discussed and addressed: > > - Create an iterator view for PyBytes. The buffer object doesn't have a > view for iteration like bytes have with PyStringIter_Type. Guido said he > wants a view to play nice with the Sequence ABC. Right. Though it is a minor point and can be done later. > - Should bytes (PyString_Type) subclass from basestring? It doesn't feel > quite right to me. I think we could remove basestring completely if > bytes doesn't subclass from it. Definitely not. basestring is for text strings. We could even decide to remove it; we should instead have ABCs for this purpose. > - Do we need a common base type for bytes and buffer like e.g. basebytes? We can deal with that in abc.py as well, using virtual inheritance (the .register() method). > - The new bytes type (formally known as str8 / PyString_Type) still has You mean 'formerly', not 'formally' :-) I prefer to just call these by their C names (PyString) to be precise, as the C names aren't changing (at least not yet ;-). > a bunch of methods from its original Python 2.x parent: > > ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', > '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', > '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', > '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', > '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', > '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count', > 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', > 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', > 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', > 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', > 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', > 'upper', 'zfill'] > > Should any of these methods be removed? No, that's spelled out in the PEP. Those should all stay. (If you see a method that's not listed in the PEP, ask me about it before deleting it. :-) > - PyString still excepts unicode in a lot of places and some important > parts of Python still require it. The interpreter was f... up as I > removed unicode support from functions like PyString_Size and > PyString_AsString. I'm not sure which function is causing trouble. The > error message was an exception bootstrapping error because > PyImport_ImportModule("__builtin__") failed. Should these methods still > accept unicode and convert it with the default encoding? Several people have noted the same issue. My goal is to remove this behavior completely. I don't know how much it will take; these bootstrap issues are always hard to debug and sometimes hard to fix. I am looking into this a bit right now; I suspect it's got to do with some types that still return a PyString from their repr(). I noticed that even removing .encode() from PyString breaks about 5 tests. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Wed Oct 10 20:10:43 2007 From: brett at python.org (Brett Cannon) Date: Wed, 10 Oct 2007 11:10:43 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> Message-ID: On 10/10/07, Alexandre Vassalotti wrote: > On 10/10/07, Gregory P. Smith wrote: > > > > - remove buffer API from PyUnicode > > > > > > I'll take these two with a goal of having them done by the end of the > > week. > > > > > > > I should've known not to believe the simple description. This one is > > proving difficult by itself. If I modify the Unicode object to not support > > the buffer API I can't even launch the python interpreter. Any one with > > more time on their hands want this one? > > > > I have a patch for this one. I just haven't tested it throughly. > I attached the patch, so free to improve it. It's best to toss all patches up on the issue tracker as then they don't get lost amongst the other emails in the mailing list. Plus it provides a more centralized history of what happens with the code and lets anyone searching for work on this exact topic have another place to find it. -Brett From brett at python.org Wed Oct 10 20:12:50 2007 From: brett at python.org (Brett Cannon) Date: Wed, 10 Oct 2007 11:12:50 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470D1371.3020309@cheimes.de> References: <470D1371.3020309@cheimes.de> Message-ID: On 10/10/07, Christian Heimes wrote: > Guido van Rossum wrote: > > > The tasks I can think of are: > [...] > > (Resend, the first mail didn't make it and I forgot a point) > > While I was working on a patch for the renaming of bytes and str8 I > found some open issues that need to be discussed and addressed: > > - Create an iterator view for PyBytes. The buffer object doesn't have a > view for iteration like bytes have with PyStringIter_Type. Guido said he > wants a view to play nice with the Sequence ABC. > > - Should bytes (PyString_Type) subclass from basestring? It doesn't feel > quite right to me. I think we could remove basestring completely if > bytes doesn't subclass from it. > > - Do we need a common base type for bytes and buffer like e.g. basebytes? > > - The new bytes type (formally known as str8 / PyString_Type) still has > a bunch of methods from its original Python 2.x parent: > > ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', > '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', > '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', > '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', > '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', > '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count', > 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', > 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', > 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', > 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', > 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', > 'upper', 'zfill'] > > Should any of these methods be removed? > See PEP 3137; http://www.python.org/dev/peps/pep-3137/#methods . -Brett From lists at cheimes.de Wed Oct 10 21:08:27 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 10 Oct 2007 21:08:27 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <470D1371.3020309@cheimes.de> Message-ID: <470D232B.80607@cheimes.de> Guido van Rossum wrote: > Definitely not. basestring is for text strings. We could even decide > to remove it; we should instead have ABCs for this purpose. I'm going to provide a patch which rips basestring out, k? Somebody has to write a fixer for 2to3 which replaces code like isinstance(egg, basestring) with isinstance(egg, str). > You mean 'formerly', not 'formally' :-) I prefer to just call these by > their C names (PyString) to be precise, as the C names aren't changing > (at least not yet ;-). Oh, formerly ... right. The current state of the names is very confusing. It's going to cost me some cups of coffee. str - PyUnicode bytes - PyString buffer - PyBytes > No, that's spelled out in the PEP. Those should all stay. (If you see > a method that's not listed in the PEP, ask me about it before deleting > it. :-) Doh, I should have read the PEP again before asking the question. I've a question about one point. The PEP states "They accept anything that implements the PEP 3118 buffer API for bytes arguments, and return the same type as the object whose method is called ("self")". Which types do implement the buffer API? PyString, PyBytes but not PyUnicode? For now the PyString takes PyUnicode objects are argument and vice versa but PyBytes doesn't take unicode. Do I understand correctly that PyString must not accept PyUnicode? >>> b"abc".count("b") 1 >>> "abc".count(b"b") 1 >> buffer(b"abc").count("b") Traceback (most recent call last): File "", line 1, in SystemError: can't use str as char buffer >>> buffer(b"abc").count(b"b") 1 > Several people have noted the same issue. My goal is to remove this > behavior completely. I don't know how much it will take; these > bootstrap issues are always hard to debug and sometimes hard to fix. I tried to debug and fix it but I gave up after half an hour. > I am looking into this a bit right now; I suspect it's got to do with > some types that still return a PyString from their repr(). I noticed > that even removing .encode() from PyString breaks about 5 tests. Great! I've a patch that renames PyString -> bytes and PyByte -> buffer while keeping str8 as an alias for bytes until str8 is removed. It's based on Alexandres patch which itself is partly based on my patch. It breaks a hell of a lot but it could give you a head start. >>> b'' b'' >>> type(b'') >>> type(b'') is str8 True >>> type(b'') is bytes True >>> type(buffer(b'')) I'll keep working on the patch. Crys From g.brandl at gmx.net Wed Oct 10 21:33:24 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 10 Oct 2007 21:33:24 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470D232B.80607@cheimes.de> References: <470D1371.3020309@cheimes.de> <470D232B.80607@cheimes.de> Message-ID: Christian Heimes schrieb: >> You mean 'formerly', not 'formally' :-) I prefer to just call these by >> their C names (PyString) to be precise, as the C names aren't changing >> (at least not yet ;-). > > Oh, formerly ... right. The current state of the names is very > confusing. It's going to cost me some cups of coffee. > > str - PyUnicode > bytes - PyString > buffer - PyBytes I agree that this is quite confusing. The PyBytes functions can be changed without a thought since they aren't 2.x heritage. Since PyBuffer_* is already taken, what about a PyByteBuffer_ prefix? PyString_ could then be renamed to PyByteString_. PyUnicode might be allowed to stay... Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From lists at cheimes.de Wed Oct 10 21:58:19 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 10 Oct 2007 21:58:19 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <470D1371.3020309@cheimes.de> <470D232B.80607@cheimes.de> Message-ID: Georg Brandl wrote: > I agree that this is quite confusing. The PyBytes functions can be changed > without a thought since they aren't 2.x heritage. Since PyBuffer_* is already > taken, what about a PyByteBuffer_ prefix? PyString_ could then be renamed > to PyByteString_. PyUnicode might be allowed to stay... I like your idea! IMHO PyUnicode_ can stay. It reflects the intention and aim of the type and it's easy to remember. str() contains unicode data and it's C name is PyUnicode. That works for me. *g* For the other two names I find PyBytes_ for bytes() and PyBytesBuffer_ for buffer() easier to remember and more consistent. Christian From brett at python.org Wed Oct 10 22:30:36 2007 From: brett at python.org (Brett Cannon) Date: Wed, 10 Oct 2007 13:30:36 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <470D1371.3020309@cheimes.de> <470D232B.80607@cheimes.de> Message-ID: On 10/10/07, Christian Heimes wrote: > Georg Brandl wrote: > > I agree that this is quite confusing. The PyBytes functions can be changed > > without a thought since they aren't 2.x heritage. Since PyBuffer_* is already > > taken, what about a PyByteBuffer_ prefix? PyString_ could then be renamed > > to PyByteString_. PyUnicode might be allowed to stay... > > I like your idea! > > IMHO PyUnicode_ can stay. It reflects the intention and aim of the type > and it's easy to remember. str() contains unicode data and it's C name > is PyUnicode. That works for me. *g* > > For the other two names I find PyBytes_ for bytes() and PyBytesBuffer_ > for buffer() easier to remember and more consistent. +1 from me. No need to have PyBytes_ be PyBytesString_ as the string tie-in will become historical. Plus PyBytes_ is shorter without losing any detail of what the functions work with. -Brett From guido at python.org Wed Oct 10 23:00:26 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Oct 2007 14:00:26 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <470D1371.3020309@cheimes.de> <470D232B.80607@cheimes.de> Message-ID: It's all fine to debate new names, but for 3.0a2, the existing C-level names will be used. Period. I am not going to review a change that touches every other line of code to do such a big rename. FWIW, I think the new names should be different from any existing names, otherwise merges from the trunk will be too much of a pain (and ditto for ports of 3rd party code). --Guido On 10/10/07, Brett Cannon wrote: > On 10/10/07, Christian Heimes wrote: > > Georg Brandl wrote: > > > I agree that this is quite confusing. The PyBytes functions can be changed > > > without a thought since they aren't 2.x heritage. Since PyBuffer_* is already > > > taken, what about a PyByteBuffer_ prefix? PyString_ could then be renamed > > > to PyByteString_. PyUnicode might be allowed to stay... > > > > I like your idea! > > > > IMHO PyUnicode_ can stay. It reflects the intention and aim of the type > > and it's easy to remember. str() contains unicode data and it's C name > > is PyUnicode. That works for me. *g* > > > > For the other two names I find PyBytes_ for bytes() and PyBytesBuffer_ > > for buffer() easier to remember and more consistent. > > +1 from me. No need to have PyBytes_ be PyBytesString_ as the string > tie-in will become historical. Plus PyBytes_ is shorter without > losing any detail of what the functions work with. > > -Brett > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Oct 10 23:06:33 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Oct 2007 14:06:33 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470D232B.80607@cheimes.de> References: <470D1371.3020309@cheimes.de> <470D232B.80607@cheimes.de> Message-ID: On 10/10/07, Christian Heimes wrote: > I've a question about one point. The PEP states "They accept anything > that implements the PEP 3118 buffer API for bytes arguments, and return > the same type as the object whose method is called ("self")". Which > types do implement the buffer API? PyString, PyBytes but not PyUnicode? Plus some other standard types, like memoryview and array.array. Plus certain extension types, like numpy arrays. > For now the PyString takes PyUnicode objects are argument and vice versa > but PyBytes doesn't take unicode. Do I understand correctly that > PyString must not accept PyUnicode? Correct. > >>> b"abc".count("b") > 1 This is a bug. > >>> "abc".count(b"b") > 1 This too. > >> buffer(b"abc").count("b") > Traceback (most recent call last): > File "", line 1, in > SystemError: can't use str as char buffer What is buffer? Are you using an old version of the tree (where it was an object like memoryview) or a patched version where you've already renamed str8 to buffer? Anyway, str8().count(str()) should raise TypeError. > >>> buffer(b"abc").count(b"b") > 1 Same question. Once the PEP is completely implemented, this should be correct. > I've a patch that renames PyString -> bytes and PyByte -> buffer while > keeping str8 as an alias for bytes until str8 is removed. It's based on > Alexandres patch which itself is partly based on my patch. It breaks a > hell of a lot but it could give you a head start. The rename is trivial. It's fixing all the unit tests that matters. > >>> b'' > b'' > >>> type(b'') > > >>> type(b'') is str8 > True > >>> type(b'') is bytes > True > >>> type(buffer(b'')) > > > I'll keep working on the patch. Cool. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Wed Oct 10 23:31:35 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 10 Oct 2007 23:31:35 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <470D1371.3020309@cheimes.de> <470D232B.80607@cheimes.de> Message-ID: <470D44B7.4060509@cheimes.de> Guido van Rossum wrote: >>>>> b"abc".count("b") >> >> 1 > > > > This is a bug. > > >>>>> "abc".count(b"b") >> >> 1 > > > > This too. > > >>>> >>>> buffer(b"abc").count("b") >> >> Traceback (most recent call last): >> >> File "", line 1, in >> >> SystemError: can't use str as char buffer > > > > What is buffer? Are you using an old version of the tree (where it was > > an object like memoryview) or a patched version where you've already > > renamed str8 to buffer? It was a test in my patched version of Python with the new names (str8 -> bytes, bytes -> buffer). > > The rename is trivial. It's fixing all the unit tests that matters. Yes, I know what you are talking about. *g* The unit tests aren't easy to fix. It will take some time. Right now even the interpreter isn't running with the new names. >>>>> >>>>> b'' >> >> b'' >>>>> >>>>> type(b'') >> >> >>>>> >>>>> type(b'') is str8 >> >> True >>>>> >>>>> type(b'') is bytes >> >> True >>>>> >>>>> type(buffer(b'')) >> >> >> >> >> >> I'll keep working on the patch. > > > > Cool. That was another interpreter session with my rename patch. I've another patch that removes basestring from Python 3.0: http://bugs.python.org/issue1258 Christian From greg at krypto.org Thu Oct 11 07:14:09 2007 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 10 Oct 2007 22:14:09 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> Message-ID: <52dc1c820710102214h63a04ad4ua96a459957fa2071@mail.gmail.com> haha wow! your patch was a *lot* less messy than I was expecting things could get. most of the test suite still seems to pass for me with this applied. if you haven't already please post it on bugs.python.org. On 10/10/07, Alexandre Vassalotti wrote: > > On 10/10/07, Gregory P. Smith wrote: > > > > - remove buffer API from PyUnicode > > > > > > I'll take these two with a goal of having them done by the end of the > > week. > > > > > > > I should've known not to believe the simple description. This one is > > proving difficult by itself. If I modify the Unicode object to not > support > > the buffer API I can't even launch the python interpreter. Any one with > > more time on their hands want this one? > > > > I have a patch for this one. I just haven't tested it throughly. > I attached the patch, so free to improve it. > > -- Alexandre > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071010/5e1e0c4c/attachment.htm From greg at krypto.org Thu Oct 11 09:59:35 2007 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 11 Oct 2007 00:59:35 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> Message-ID: <52dc1c820710110059y281b8ff5w6c4544946b7ed261@mail.gmail.com> Guido - One tiny question has come up while working on this one: Should the PyBytes buffer (mutable bytes) object's .append(val) and .remove(val) methods accept anything other than an int in the 0..255 range? I believe the answer to be no based on the previous long thread on this but these two weren't mentioned at the time so i figure I'll ask. Should a pep3118 buffer api supporting object that produces a length 1 buffer also work for append and remove? That would allow .append(b'!') or .remove(b'!'). amusingly right now in 3.0a1 there is a bug where .append('33') will happily append a b'!' by converting it into an int then into a byte. regardless of the answer that misbehavior will be zapped in the patch i'm about to submit. ;) -gps On 10/8/07, Gregory P. Smith wrote: > > > - add missing methods to PyBytes (for list, see the PEP and compare to > > what's already there) > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071011/995afb38/attachment.htm From greg at krypto.org Thu Oct 11 10:09:31 2007 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 11 Oct 2007 01:09:31 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710110059y281b8ff5w6c4544946b7ed261@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710110059y281b8ff5w6c4544946b7ed261@mail.gmail.com> Message-ID: <52dc1c820710110109h6c5061b7t7832962873f706a@mail.gmail.com> On 10/11/07, Gregory P. Smith wrote: > > Guido - > > One tiny question has come up while working on this one: > > Should the PyBytes buffer (mutable bytes) object's .append(val) and > .remove(val) methods accept anything other than an int in the 0..255 range? > > I believe the answer to be no based on the previous long thread on this > but these two weren't mentioned at the time so i figure I'll ask. Should a > pep3118 buffer api supporting object that produces a length 1 buffer also > work for append and remove? That would allow .append(b'!') or > .remove(b'!'). I'm doubly assuming 'no' now as the .insert() method would also need it for consistancy which just be plain gross to allow .insert(5, b'x') to work but .insert(5, b'xyz') to fail with a ValueError. Consider the question unasked unless you want a different answer. amusingly right now in 3.0a1 there is a bug where .append('33') will happily > append a b'!' by converting it into an int then into a byte. regardless of > the answer that misbehavior will be zapped in the patch i'm about to submit. > ;) > > -gps > > On 10/8/07, Gregory P. Smith wrote: > > > > > > - add missing methods to PyBytes (for list, see the PEP and compare to > > > what's already there) > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071011/2e89e471/attachment-0001.htm From tom at vector-seven.com Tue Oct 9 17:54:00 2007 From: tom at vector-seven.com (Thomas Lee) Date: Wed, 10 Oct 2007 01:54:00 +1000 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> Message-ID: <470BA418.5060301@vector-seven.com> Guido van Rossum wrote: >>> >>>> - make == and != between PyBytes and PyUnicode return False instead of >>>> raising TypeError >>>> >>>> Just thinking about it I'm pretty sure my initial patch is wrong - forgive my ignorance. To remove the ambiguity, is it fair to state the following? bytes() == str() -> False instead of raising TypeError bytes() != str() -> True instead of raising TypeError I initially read that as "return False whenever any comparison between bytes and unicode objects is attempted" ... > Assuming that PyUnicode_Compare is a three-way comparison (less, > equal, more), it should raise a TypeError when one of the arguments is > a PyString or PyBytes. > > Cool. Should have that sorted out soon. As above: str8() == str() -> False str8() != str() -> True Correct? >> Is it just me, or do string/bytes comparisons already work? >> >> >>> s = str8('test') >> >>> b = b'test' >> >>> s == b >> True >> >>> b == s >> True >> >>> s != b >> False >> >>> b != s >> False >> > > Seems it's already so. Do they order properly too? (< <= > >=) > Looks like it: >>> str8('a') > b'b' False >>> str8('a') < b'b' True >>> str8('a') <= b'b' True >>> str8('a') >= b'b' False Cheers, Tom From tom at vector-seven.com Tue Oct 9 18:19:13 2007 From: tom at vector-seven.com (Thomas Lee) Date: Wed, 10 Oct 2007 02:19:13 +1000 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> <470BA418.5060301@vector-seven.com> Message-ID: <470BAA01.9090202@vector-seven.com> Guido van Rossum wrote: > > The point is that a bytes and a str instance are never considered equal... > > Sorry. I understand now. My brain must have been on a holiday earlier. :) Just pushed an updated patch to the bug tracker. >> str8() == str() -> False >> str8() != str() -> True >> >> Correct? >> > > Well, in this case you actually have to compare the individual bytes. > But yes. ;-) > I'm confused: if I'm making == and != between PyString return False instead of converting, at what point would I need to be comparing bytes? The fix I have ready for this merely wipes out the conversion from PyString to PyUnicode in PyUnicode_Compare and the existing code takes care of the rest. Is this all that's required, or have I misinterpreted this one too? :) Cheers, Tom From tom at vector-seven.com Wed Oct 10 05:59:21 2007 From: tom at vector-seven.com (Thomas Lee) Date: Wed, 10 Oct 2007 13:59:21 +1000 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <470C4E19.6090305@vector-seven.com> I was having weird problems with the codec registry too - specifically the assertion checking unidata_version == "3.2.0" mysteriously failing after forcing string/unicode equality checks to return false. Thought maybe unidata_version somehow got a str8 version or something weird like that ... haven't looked into it at all though. I'll be taking another look tomorrow night. I'll try to give your patch a test run then and see if I can help at all if somebody else hasn't already sorted it out. Cheers, Tom Alexandre Vassalotti wrote: > On 10/8/07, Alexandre Vassalotti wrote: > >> On 10/8/07, Guido van Rossum wrote: >> >>> - change indexing and iteration over PyString to return ints, not >>> 1-char PyStrings >>> >> I will try do this one. >> > > This took a bit longer than I expected. Changing the PyString iterator > to return ints was easy, but I ran into some issues with the codec > registry. > > I won't have the time this week to work on my patch any further. > Meanwhile if someone would like to improve it, feel free to do so (the > patch is attached to this email). Otherwise, I will continue to work > on it next weekend. > > Cheers, > -- Alexandre > > ------------------------------------------------------------------------ > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/krumms%40gmail.com From tom at vector-seven.com Wed Oct 10 06:03:43 2007 From: tom at vector-seven.com (Thomas Lee) Date: Wed, 10 Oct 2007 14:03:43 +1000 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> <470BA418.5060301@vector-seven.com> Message-ID: <470C4F1F.1080401@vector-seven.com> Guido van Rossum wrote: > On 10/9/07, Guido van Rossum wrote: > >> Which reminds me of a task I forgot to add to the list: >> >> - change the constructor for PyString to match the one for PyBytes. >> > > And another pair of forgotten tasks: > > - change PyBytes so that its str() is the same as its repr(). > - change PyString so that its str() is the same as its repr(). > > The former seems easy. The latter might cause trouble (though then > again, it may not). > > I should also note that I already submitted the changes to remove > locale support from PyString, and am working on removing its encode() > method. This is not going so smoothly. > > I'll take the constructor once I sort out unicode/string comparison. If nobody else has taken care of the other two by the weekend, I'll take a look at them too. Cheers, Tom From tom at vector-seven.com Thu Oct 11 14:41:57 2007 From: tom at vector-seven.com (Thomas Lee) Date: Thu, 11 Oct 2007 22:41:57 +1000 Subject: [Python-3000] PEP 3137 patch #2 - str8() == str() -> False Message-ID: <470E1A15.2030709@vector-seven.com> Okay, here's another patch: http://bugs.python.org/issue1263 Using unicode-string-eq-false-r3.patch, str8/str comparison will now return False instead of attempting to convert. Unfortunately this breaks about 30 tests. In attempting to fix test_unicode (the obvious starting point for all this), I made changes to Python/structmember.c to use PyUnicode instead of PyString - this fixed some of the issues in test_unicode, but there would appear to be other, similar problems elsewhere. I'm not going to have the time to get this done by Friday, but I may be able to work more on this over the weekend. I'd love some feedback on my changes to structmember.c so I know if I'm going about it the right way (my knowledge of the PyUnicode API and unicode in general is pretty limited). I put the structmember.c patch in a separate file for now - unicode-string-eq-false-structmember-c-r1.patch Until then, if anybody wants to help out with getting those tests running that would be great too. Otherwise, I should have made some sort of measurable progress by Monday. Cheers, Tom From lists at cheimes.de Thu Oct 11 17:07:59 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 11 Oct 2007 17:07:59 +0200 Subject: [Python-3000] basestring removal, __file__ and co_filename Message-ID: <470E3C4F.5020707@cheimes.de> Hello Python! I've written a patch that removes basestring from py3k: http://bugs.python.org/issue1258 During the testing of the patch I hit a problem with __file__ and codeobject.co_filename. Both __file__ and co_filename are byte strings and not unicode which is causing some trouble. Guido asked me to provide another patch which decodes the string using the default filesystem encoding. Most of the patch was straight forward and easy but I hit one spot that's causing some trouble. It's a chicken and egg issue. codeobject.co_filename is a PyString instance. I like to perform filename = PyString_AsDecodedObject(filename, Py_FileSystemDefaultEncoding ? Py_FileSystemDefaultEncoding : "UTF-8", NULL); in order to decode the string with either the fs encoding or UTF-8 but it's not possible. It's way too early in the bootstrapping process of Python and the codecs aren't registered yet. In fact large parts of the codecs package is implemented in Python ... Ideas? I could check if Py_FilesystemDefaultEncoding is one of the encodings that are implemented in Python (UTF-8, 16, 32, latin1, mbcs) but what if the fs default encoding is some obscure encoding? Christian From lists at cheimes.de Thu Oct 11 17:21:44 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 11 Oct 2007 17:21:44 +0200 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: <470E3C4F.5020707@cheimes.de> References: <470E3C4F.5020707@cheimes.de> Message-ID: <470E3F88.7010301@cheimes.de> PS: The patch for __file__ and co_filename is causing a minor problem with the hotspot profiler and filenames. I remember a plan to remove hotspot from Python 3.x. Shall I leave the problem alone? From lists at cheimes.de Thu Oct 11 17:32:25 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 11 Oct 2007 17:32:25 +0200 Subject: [Python-3000] Bug with pdb.set_trace() and with block Message-ID: I found a pretty annoying bug caused by with blocks. A with block terminates the debugging session and the program keeps running. It's not possible to go to the next line with 'n'. 's' steps into the open() call. # pdbtest.py import pdb pdb.set_trace() print("before with") with open("/etc/passwd") as fd: data = fd.read() print("after with") print("end of program") $ ./python pdbtest.py > /home/heimes/dev/python/py3k/pdbtest.py(3)() -> print("before with") (Pdb) n before with > /home/heimes/dev/python/py3k/pdbtest.py(4)() -> with open("/etc/passwd") as fd: (Pdb) n after with end of program Christian From fdrake at acm.org Thu Oct 11 18:01:03 2007 From: fdrake at acm.org (Fred Drake) Date: Thu, 11 Oct 2007 12:01:03 -0400 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: <470E3F88.7010301@cheimes.de> References: <470E3C4F.5020707@cheimes.de> <470E3F88.7010301@cheimes.de> Message-ID: On Oct 11, 2007, at 11:21 AM, Christian Heimes wrote: > PS: The patch for __file__ and co_filename is causing a minor problem > with the hotspot profiler and filenames. I remember a plan to remove > hotspot from Python 3.x. Shall I leave the problem alone? I asked about the removal of hotshot a few weeks ago, and there was some uncertainty about whether a decision had been reached. Reading back over the mails, there were no objections. Python 3.0 seems a perfect time to rip it out. If there are no objections, I'll do that this weekend. -Fred -- Fred Drake From guido at python.org Thu Oct 11 18:55:34 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2007 09:55:34 -0700 Subject: [Python-3000] Bug with pdb.set_trace() and with block In-Reply-To: References: Message-ID: Please file this in the bug tracker. Thanks for finding this -- I kew there was a problem with the debugger losing control but I never traced it down to the with statement! On 10/11/07, Christian Heimes wrote: > I found a pretty annoying bug caused by with blocks. A with block > terminates the debugging session and the program keeps running. It's not > possible to go to the next line with 'n'. 's' steps into the open() call. > > # pdbtest.py > import pdb > pdb.set_trace() > print("before with") > with open("/etc/passwd") as fd: > data = fd.read() > print("after with") > print("end of program") > > $ ./python pdbtest.py > > /home/heimes/dev/python/py3k/pdbtest.py(3)() > -> print("before with") > (Pdb) n > before with > > /home/heimes/dev/python/py3k/pdbtest.py(4)() > -> with open("/etc/passwd") as fd: > (Pdb) n > after with > end of program > > > Christian > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Oct 11 18:56:11 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2007 09:56:11 -0700 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: References: <470E3C4F.5020707@cheimes.de> <470E3F88.7010301@cheimes.de> Message-ID: On 10/11/07, Fred Drake wrote: > On Oct 11, 2007, at 11:21 AM, Christian Heimes wrote: > > PS: The patch for __file__ and co_filename is causing a minor problem > > with the hotspot profiler and filenames. I remember a plan to remove > > hotspot from Python 3.x. Shall I leave the problem alone? > > I asked about the removal of hotshot a few weeks ago, and there was > some uncertainty about whether a decision had been reached. Reading > back over the mails, there were no objections. Python 3.0 seems a > perfect time to rip it out. If there are no objections, I'll do that > this weekend. Go for it! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Oct 11 18:58:42 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2007 09:58:42 -0700 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: <470E3C4F.5020707@cheimes.de> References: <470E3C4F.5020707@cheimes.de> Message-ID: Hm, can't we make co_filename a PyUnicode instance? On 10/11/07, Christian Heimes wrote: > Hello Python! > > I've written a patch that removes basestring from py3k: > http://bugs.python.org/issue1258 During the testing of the patch I hit a > problem with __file__ and codeobject.co_filename. Both __file__ and > co_filename are byte strings and not unicode which is causing some > trouble. Guido asked me to provide another patch which decodes the > string using the default filesystem encoding. > > Most of the patch was straight forward and easy but I hit one spot > that's causing some trouble. It's a chicken and egg issue. > codeobject.co_filename is a PyString instance. I like to perform > > filename = PyString_AsDecodedObject(filename, > Py_FileSystemDefaultEncoding ? Py_FileSystemDefaultEncoding : "UTF-8", > NULL); > > in order to decode the string with either the fs encoding or UTF-8 but > it's not possible. It's way too early in the bootstrapping process of > Python and the codecs aren't registered yet. In fact large parts of the > codecs package is implemented in Python ... > > Ideas? > > I could check if Py_FilesystemDefaultEncoding is one of the encodings > that are implemented in Python (UTF-8, 16, 32, latin1, mbcs) but what if > the fs default encoding is some obscure encoding? > > Christian > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Thu Oct 11 19:26:23 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 11 Oct 2007 19:26:23 +0200 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: References: <470E3C4F.5020707@cheimes.de> Message-ID: <470E5CBF.3030103@cheimes.de> Guido van Rossum wrote: > Hm, can't we make co_filename a PyUnicode instance? I already did it in my patch but doesn't it cause a problem when the encoding isn't UTF-8? I may understand PyUnicode_FromString(PyString_AS_STRING(filename)) wrong. Doesn't it decode filename from UTF-8? Christian From lists at cheimes.de Thu Oct 11 19:27:04 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 11 Oct 2007 19:27:04 +0200 Subject: [Python-3000] Bug with pdb.set_trace() and with block In-Reply-To: References: Message-ID: <470E5CE8.6080706@cheimes.de> Guido van Rossum wrote: > Please file this in the bug tracker. > > Thanks for finding this -- I kew there was a problem with the debugger > losing control but I never traced it down to the with statement! Already done! http://bugs.python.org/issue1265 Christian From guido at python.org Thu Oct 11 19:40:21 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2007 10:40:21 -0700 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: <470E5CBF.3030103@cheimes.de> References: <470E3C4F.5020707@cheimes.de> <470E5CBF.3030103@cheimes.de> Message-ID: Um, where does the filename object in that expression come from? It appears to be a PyString object. Who created it? That could should be changed to create a PyUnicode instead (using the filesystem encoding). On 10/11/07, Christian Heimes wrote: > Guido van Rossum wrote: > > Hm, can't we make co_filename a PyUnicode instance? > > I already did it in my patch but doesn't it cause a problem when the > encoding isn't UTF-8? I may understand > PyUnicode_FromString(PyString_AS_STRING(filename)) wrong. Doesn't it > decode filename from UTF-8? > > Christian > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Thu Oct 11 20:01:15 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 11 Oct 2007 20:01:15 +0200 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: References: <470E3C4F.5020707@cheimes.de> <470E5CBF.3030103@cheimes.de> Message-ID: <470E64EB.2070203@cheimes.de> Guido van Rossum wrote: > Um, where does the filename object in that expression come from? It > appears to be a PyString object. Who created it? That could should be > changed to create a PyUnicode instead (using the filesystem encoding). Python/compile.c:makecode() filename = PyString_FromString(c->c_filename); Modules/pyexpat.c:getcode() filename = PyString_FromString(__FILE__); Objects/codeobject.c:code_new() PyArg_ParseTuple(args, "iiiiiSO!O!O!SSiS|O!O!:code" As I tried to explain earlier that may be a problem. PyUnicode_Decode() doesn't work so early. The codecs package isn't initialized yet. Christian From fdrake at acm.org Thu Oct 11 20:06:23 2007 From: fdrake at acm.org (Fred Drake) Date: Thu, 11 Oct 2007 14:06:23 -0400 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: <470E3F88.7010301@cheimes.de> References: <470E3C4F.5020707@cheimes.de> <470E3F88.7010301@cheimes.de> Message-ID: On Oct 11, 2007, at 11:21 AM, Christian Heimes wrote: > PS: The patch for __file__ and co_filename is causing a minor problem > with the hotspot profiler and filenames. I remember a plan to remove > hotspot from Python 3.x. Shall I leave the problem alone? hotshot should no longer be a problem for this. -Fred -- Fred Drake From lists at cheimes.de Thu Oct 11 20:10:32 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 11 Oct 2007 20:10:32 +0200 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: References: <470E3C4F.5020707@cheimes.de> <470E3F88.7010301@cheimes.de> Message-ID: <470E6718.2080909@cheimes.de> Fred Drake wrote: > hotshot should no longer be a problem for this. Thanks Fred! Unfortunately the anon svn server is down again. It's the second time this week. Something must be wrong with the Apache server for svn.python.org. Christian From guido at python.org Thu Oct 11 20:23:22 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2007 11:23:22 -0700 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: <470E64EB.2070203@cheimes.de> References: <470E3C4F.5020707@cheimes.de> <470E5CBF.3030103@cheimes.de> <470E64EB.2070203@cheimes.de> Message-ID: On 10/11/07, Christian Heimes wrote: > Guido van Rossum wrote: > > Um, where does the filename object in that expression come from? It > > appears to be a PyString object. Who created it? That could should be > > changed to create a PyUnicode instead (using the filesystem encoding). > > Python/compile.c:makecode() > filename = PyString_FromString(c->c_filename); > > Modules/pyexpat.c:getcode() > filename = PyString_FromString(__FILE__); > > Objects/codeobject.c:code_new() > PyArg_ParseTuple(args, "iiiiiSO!O!O!SSiS|O!O!:code" > > As I tried to explain earlier that may be a problem. PyUnicode_Decode() > doesn't work so early. The codecs package isn't initialized yet. But some codecs are "built-in" and have custom APIs. I wonder if we could do something that figures out the default fs encoding, and see if it is one of the supported ones, and then uses that; otherwise tries UTF-8 with the "replace" error handling option (so it won't fail if the data is non-UTF-8). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at krypto.org Thu Oct 11 22:11:49 2007 From: greg at krypto.org (Gregory P. Smith) Date: Thu, 11 Oct 2007 13:11:49 -0700 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: References: <470E3C4F.5020707@cheimes.de> <470E5CBF.3030103@cheimes.de> <470E64EB.2070203@cheimes.de> Message-ID: <52dc1c820710111311v3dc1c4f1lf797910c313faf56@mail.gmail.com> On 10/11/07, Guido van Rossum wrote: > > On 10/11/07, Christian Heimes wrote: > > Guido van Rossum wrote: > > > Um, where does the filename object in that expression come from? It > > > appears to be a PyString object. Who created it? That could should be > > > changed to create a PyUnicode instead (using the filesystem encoding). > > > > Python/compile.c:makecode() > > filename = PyString_FromString(c->c_filename); > > > > Modules/pyexpat.c:getcode() > > filename = PyString_FromString(__FILE__); > > > > Objects/codeobject.c:code_new() > > PyArg_ParseTuple(args, "iiiiiSO!O!O!SSiS|O!O!:code" > > > > As I tried to explain earlier that may be a problem. PyUnicode_Decode() > > doesn't work so early. The codecs package isn't initialized yet. > > But some codecs are "built-in" and have custom APIs. I wonder if we > could do something that figures out the default fs encoding, and see > if it is one of the supported ones, and then uses that; otherwise > tries UTF-8 with the "replace" error handling option (so it won't fail > if the data is non-UTF-8). > Thats pretty much what Christian pondered at the start of this thread but with a defined "failure" mode. +1 from me, give it a try and see what 3.0a2 testers say. Are there OSes and filesystems out there that'd store in anything other than one of the popular codecs (UTF-8, 16, 32, latin1, mbcs)? That seems like a bad idea to me but obviously I don't run the world. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071011/94cd685f/attachment-0001.htm From lists at cheimes.de Thu Oct 11 23:23:22 2007 From: lists at cheimes.de (Christian Heimes) Date: Thu, 11 Oct 2007 23:23:22 +0200 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: <52dc1c820710111311v3dc1c4f1lf797910c313faf56@mail.gmail.com> References: <470E3C4F.5020707@cheimes.de> <470E5CBF.3030103@cheimes.de> <470E64EB.2070203@cheimes.de> <52dc1c820710111311v3dc1c4f1lf797910c313faf56@mail.gmail.com> Message-ID: <470E944A.80201@cheimes.de> Gregory P. Smith wrote: > Thats pretty much what Christian pondered at the start of this thread but > with a defined "failure" mode. > > +1 from me, give it a try and see what 3.0a2 testers say. Are there OSes > and filesystems out there that'd store in anything other than one of the > popular codecs (UTF-8, 16, 32, latin1, mbcs)? That seems like a bad idea to > me but obviously I don't run the world. I've implemented the method but my C is a bit rusty and not very good. I'm not happy with the code especially with the large if else block. PyObject* PyUnicode_DecodeFSDefault(const char *string, Py_ssize_t length, const char *errors) { PyObject *v = NULL; char encoding[32], mangled[32], *encptr, *manptr; char tmp; if (errors != NULL) Py_FatalError("non-NULL encoding in PyUnicode_DecodeFSDefault"); if ((length == 0) && *string) length = (Py_ssize_t)strlen(string); strncpy(encoding, Py_FileSystemDefaultEncoding ? Py_FileSystemDefaultEncoding : "UTF-8", 31); encoding[31] = '\0'; encptr = encoding; manptr = mangled; /* lower the string and remove non alpha numeric chars like '-' */ while(*encptr) { tmp = *encptr++; if (isupper(tmp)) tmp = tolower(tmp); if (!isalnum(tmp)) continue; *manptr++ = tmp; } *manptr++ = '\0'; if (mangled == "utf8") v = PyUnicode_DecodeUTF8(string, length, NULL); else if (mangled == "utf16") v = PyUnicode_DecodeUTF16(string, length, NULL, 0); else if (mangled == "utf32") v = PyUnicode_DecodeUTF32(string, length, NULL, 0); else if ((mangled == "latin1") || (mangled == "iso88591") || (mangled == "iso885915")) v = PyUnicode_DecodeLatin1(string, length, NULL); else if (mangled == "ascii") v = PyUnicode_DecodeASCII(string, length, NULL); #ifdef MS_WIN32 else if (mangled = "mbcs") v = PyUnicode_DecodeMBCS(string, length, NULL); #endif if (v == NULL) v = PyUnicode_DecodeUTF8(string, length, "replace"); return (PyObject*)v; } From greg.ewing at canterbury.ac.nz Fri Oct 12 01:00:22 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 12 Oct 2007 12:00:22 +1300 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710110059y281b8ff5w6c4544946b7ed261@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710110059y281b8ff5w6c4544946b7ed261@mail.gmail.com> Message-ID: <470EAB06.2000000@canterbury.ac.nz> Gregory P. Smith wrote: > Should a pep3118 buffer api supporting object that produces a length 1 > buffer also work for append and remove? My thought is -- only if such an object is also usable in any *other* context expecting an integer. And I don't think that would be a good idea at all. You can always use .extend(b'!') to append a byte that's already inside another bytes object or other buffer-supporting object. (BTW, I'm worried that we're overloading the term "buffer" here. Having it refer to both the buffer interface and also certain types of object that hold data is getting confusing.) -- Greg From greg.ewing at canterbury.ac.nz Fri Oct 12 01:33:17 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 12 Oct 2007 12:33:17 +1300 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: <470E3C4F.5020707@cheimes.de> References: <470E3C4F.5020707@cheimes.de> Message-ID: <470EB2BD.30809@canterbury.ac.nz> Christian Heimes wrote: > I like to perform > > filename = PyString_AsDecodedObject(filename, > Py_FileSystemDefaultEncoding ? Py_FileSystemDefaultEncoding : "UTF-8", > NULL); > > in order to decode the string with either the fs encoding or UTF-8 but > it's not possible. It's way too early in the bootstrapping process How about just using ascii if the codec system isn't fully operational? It would just mean that files needed during bootstrapping would need to have pure-ascii filenames, which doesn't seem like a serious restriction. -- Greg From lists at cheimes.de Fri Oct 12 01:57:06 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 12 Oct 2007 01:57:06 +0200 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: <470EB2BD.30809@canterbury.ac.nz> References: <470E3C4F.5020707@cheimes.de> <470EB2BD.30809@canterbury.ac.nz> Message-ID: <470EB852.6040407@cheimes.de> Greg Ewing wrote: > How about just using ascii if the codec system isn't fully > operational? It would just mean that files needed during > bootstrapping would need to have pure-ascii filenames, > which doesn't seem like a serious restriction. The file names aren't the issue but the directory names are. For example it may screw up a local installation in the user's application data directory on Windows if the user name contains umlauts. Any kind of installation in $HOME would cause trouble if $USER isn't plain ASCII. Christian From qrczak at knm.org.pl Fri Oct 12 02:34:03 2007 From: qrczak at knm.org.pl (Marcin =?UTF-8?Q?=E2=80=98Qrczak=E2=80=99?= Kowalczyk) Date: Fri, 12 Oct 2007 02:34:03 +0200 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: <52dc1c820710111311v3dc1c4f1lf797910c313faf56@mail.gmail.com> References: <470E3C4F.5020707@cheimes.de> <470E5CBF.3030103@cheimes.de> <470E64EB.2070203@cheimes.de> <52dc1c820710111311v3dc1c4f1lf797910c313faf56@mail.gmail.com> Message-ID: <1192149243.5288.13.camel@qrnik> Dnia 11-10-2007, Cz o godzinie 13:11 -0700, Gregory P. Smith pisze: > Are there OSes and filesystems out there that'd store in anything > other than one of the popular codecs (UTF-8, 16, 32, latin1, mbcs)? I've been using ISO-8859-2 by default on my Linux until February 2007. Most filenames were not Polish and thus ASCII of course, and Evolution used UTF-8 for internal filenames of its folders even with the locale encoding being ISO-8859-2 (which accidentally helped with the migration to UTF-8). -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From lists at cheimes.de Fri Oct 12 05:32:44 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 12 Oct 2007 05:32:44 +0200 Subject: [Python-3000] basestring removal, __file__ and co_filename In-Reply-To: <5d44f72f0710111941h5442f1f3k2dc2c1d2edc587eb@mail.gmail.com> References: <470E3C4F.5020707@cheimes.de> <470E5CBF.3030103@cheimes.de> <470E64EB.2070203@cheimes.de> <52dc1c820710111311v3dc1c4f1lf797910c313faf56@mail.gmail.com> <470E944A.80201@cheimes.de> <5d44f72f0710111941h5442f1f3k2dc2c1d2edc587eb@mail.gmail.com> Message-ID: <470EEADC.1050608@cheimes.de> Jeffrey Yasskin wrote: > On 10/11/07, Christian Heimes wrote: >> if (mangled == "utf8") > > FYI, this is always going to be false. It compares the pointer values, > rather than the strings. Doh! I've done too much Python programming in the past. I forgot that I've to use strcmp(s1, s2) == 0 in order to compare two strings in C. Thanks pal! Christian From lists at cheimes.de Fri Oct 12 17:49:09 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 12 Oct 2007 17:49:09 +0200 Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds Message-ID: <470F9775.4080405@cheimes.de> Yesterday I found a design problem in the array module. Travis Oliphant added a new typecode 'w' to the array module. 'w' is a wide unicode type that is guaranteed to be at least 4 bytes long. The 'u' typecode may be 2 bytes long. Unfortunately his change removed 'u' as a possible typecode which makes it unnecessary hard to write code that works on Windows (UCS2 only) and Unix (UCS4 for most Linux distributions). I've written a patch that keeps 'u' in every build and adds 'w' as an alias for 'u' in UCS-4 builds only. It also introduces the new module variable typecodes which is a unicode string containing all valid typecodes. http://bugs.python.org/issue1268 Christian From oliphant at enthought.com Fri Oct 12 19:52:27 2007 From: oliphant at enthought.com (Travis E. Oliphant) Date: Fri, 12 Oct 2007 12:52:27 -0500 Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds In-Reply-To: <470F9775.4080405@cheimes.de> References: <470F9775.4080405@cheimes.de> Message-ID: <470FB45B.6060004@enthought.com> Christian Heimes wrote: > Yesterday I found a design problem in the array module. Travis Oliphant > added a new typecode 'w' to the array module. 'w' is a wide unicode type > that is guaranteed to be at least 4 bytes long. The 'u' typecode may be > 2 bytes long. > > Unfortunately his change removed 'u' as a possible typecode which makes > it unnecessary hard to write code that works on Windows (UCS2 only) and > Unix (UCS4 for most Linux distributions). I've written a patch that > keeps 'u' in every build and adds 'w' as an alias for 'u' in UCS-4 > builds only. It also introduces the new module variable typecodes > which is a unicode string containing all valid typecodes. > The problem is to keep the array typecodes somewhat consistent with the typecodes in PEP 3118 which will be in the struct module. How about making 'U' be the typecode that translates to 'u' or 'w' depending on the platform and supporting both 'u' and 'w' on all platforms by appropriate translation of bytes on getting and setting? -Travis From lists at cheimes.de Fri Oct 12 20:14:41 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 12 Oct 2007 20:14:41 +0200 Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds In-Reply-To: <470FB45B.6060004@enthought.com> References: <470F9775.4080405@cheimes.de> <470FB45B.6060004@enthought.com> Message-ID: <470FB991.7010403@cheimes.de> Travis E. Oliphant wrote: > The problem is to keep the array typecodes somewhat consistent with the > typecodes in PEP 3118 which will be in the struct module. > How about making 'U' be the typecode that translates to 'u' or 'w' > depending on the platform and supporting both 'u' and 'w' on all > platforms by appropriate translation of bytes on getting and setting? Now I see your point. :) Your solution sounds feasible but is it realizable on all platforms? I once hit a thick wall of bricks during my work on PythonNET. I tried to make it compatible with Mono and UCS-4 builds of Python but it was really hard because the .NET standards don't care about anything else than a 16bit wchar_t which doesn't even translate to UTF-16. I fear that 'w' may hit a similar wall on Windows. Should PEP 3118 and the array module have a 'U' typecode, too? It may proof useful for platform and build independent software to have a typecode that translates to the native unicode type (UCS-2 or UCS-4). Christian From python3now at gmail.com Fri Oct 12 21:37:24 2007 From: python3now at gmail.com (James Thiele) Date: Fri, 12 Oct 2007 12:37:24 -0700 Subject: [Python-3000] PEP 3105 "Backward Compatibility" Message-ID: <8f01efd00710121237v576623adm9a4e36af37ffe6bc@mail.gmail.com> I was reading PEP 3105 -- Make print a function and in the section "Backwards Compatibility" found the following statement: "The changes proposed in this PEP will render most of today's print statements invalid, only those which incidentally feature parentheses around all of their arguments will continue to be valid Python syntax in version 3.0." -- They may both be valid syntax, but they may not do the same thing: $ python Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) >>> print (1,2) (1, 2) >>> $ python3.0 Python 3.0a1 (py3k:57844, Aug 31 2007, 08:01:11) >>> print (1,2) 1 2 -- It might be useful to note this in the PEP. James From guido at python.org Fri Oct 12 22:25:46 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2007 13:25:46 -0700 Subject: [Python-3000] PEP 3105 "Backward Compatibility" In-Reply-To: <8f01efd00710121237v576623adm9a4e36af37ffe6bc@mail.gmail.com> References: <8f01efd00710121237v576623adm9a4e36af37ffe6bc@mail.gmail.com> Message-ID: Good point. I added a few examples to the PEP. --Guido On 10/12/07, James Thiele wrote: > I was reading PEP 3105 -- Make print a function and in the section > "Backwards Compatibility" found the following statement: > "The changes proposed in this PEP will render most of today's print > statements invalid, only those which incidentally feature parentheses > around all of their arguments will continue to be valid Python syntax > in version 3.0." > -- > They may both be valid syntax, but they may not do the same thing: > $ python > Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) > >>> print (1,2) > (1, 2) > >>> > $ python3.0 > Python 3.0a1 (py3k:57844, Aug 31 2007, 08:01:11) > >>> print (1,2) > 1 2 > -- > It might be useful to note this in the PEP. > > James > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From oliphant.travis at ieee.org Fri Oct 12 23:37:33 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri, 12 Oct 2007 16:37:33 -0500 Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds In-Reply-To: <470FB991.7010403@cheimes.de> References: <470F9775.4080405@cheimes.de> <470FB45B.6060004@enthought.com> <470FB991.7010403@cheimes.de> Message-ID: <470FE91D.5010001@ieee.org> Christian Heimes wrote: > Travis E. Oliphant wrote: >> The problem is to keep the array typecodes somewhat consistent with the >> typecodes in PEP 3118 which will be in the struct module. >> How about making 'U' be the typecode that translates to 'u' or 'w' >> depending on the platform and supporting both 'u' and 'w' on all >> platforms by appropriate translation of bytes on getting and setting? > > Now I see your point. :) Your solution sounds feasible but is it > realizable on all platforms? I once hit a thick wall of bricks during my > work on PythonNET. I tried to make it compatible with Mono and UCS-4 > builds of Python but it was really hard because the .NET standards don't > care about anything else than a 16bit wchar_t which doesn't even > translate to UTF-16. I fear that 'w' may hit a similar wall on Windows. > I think it would be feasible, but I'm not sure it is worth it at this point. My suggestion right now (and what I've done) is to back-out the 'w' typecode for the array module and just leave it as 'u' as before. I'll check in this change. -Travis From greg at krypto.org Fri Oct 12 23:55:46 2007 From: greg at krypto.org (Gregory P. Smith) Date: Fri, 12 Oct 2007 14:55:46 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> Message-ID: <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> > - add missing methods to PyBytes (for list, see the PEP and compare to > > what's already there) > > > As I work on these.. Should the mutable PyBytes_ (buffer) objects implement the following methods inplace and return an additional reference to self? .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), .title(), .upper(), .zfill() Also what about .replace() and .translate()? If they are not done in place should they return a new buffer (PyBytes_) object or a bytes (PyString_) object? [i'd say a buffer (PyBytes_)] Alos if not, should we add additional .ireplace() .ilower() etc.. methods to the mutable buffer (PyBytes_)? There are speed advantages to doing many of those in place rather than a data copy. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071012/4fada1bb/attachment.htm From guido at python.org Sat Oct 13 03:20:44 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2007 18:20:44 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> Message-ID: On 10/12/07, Gregory P. Smith wrote: > > > - add missing methods to PyBytes (for list, see the PEP and compare to > > > what's already there) > > As I work on these.. Should the mutable PyBytes_ (buffer) objects implement > the following methods inplace and return an additional reference to self? > > .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), .title(), > .upper(), .zfill() No... That would be a huge trap to fall in at all sorts of occasions. > Also what about .replace() and .translate()? > If they are not done in place should they return a new buffer (PyBytes_) > object or a bytes (PyString_) object? [i'd say a buffer (PyBytes_)] They should return the same type as 'self'. > Alos if not, should we add additional .ireplace() .ilower() etc.. methods to > the mutable buffer (PyBytes_)? There are speed advantages to doing many of > those in place rather than a data copy. I'm not sure I see the use case where this matters all that much though. Let's say not, if only because it's not in the PEP. ;-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From oliphant.travis at ieee.org Fri Oct 12 23:37:33 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri, 12 Oct 2007 16:37:33 -0500 Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds In-Reply-To: <470FB991.7010403@cheimes.de> References: <470F9775.4080405@cheimes.de> <470FB45B.6060004@enthought.com> <470FB991.7010403@cheimes.de> Message-ID: <470FE91D.5010001@ieee.org> Christian Heimes wrote: > Travis E. Oliphant wrote: >> The problem is to keep the array typecodes somewhat consistent with the >> typecodes in PEP 3118 which will be in the struct module. >> How about making 'U' be the typecode that translates to 'u' or 'w' >> depending on the platform and supporting both 'u' and 'w' on all >> platforms by appropriate translation of bytes on getting and setting? > > Now I see your point. :) Your solution sounds feasible but is it > realizable on all platforms? I once hit a thick wall of bricks during my > work on PythonNET. I tried to make it compatible with Mono and UCS-4 > builds of Python but it was really hard because the .NET standards don't > care about anything else than a 16bit wchar_t which doesn't even > translate to UTF-16. I fear that 'w' may hit a similar wall on Windows. > I think it would be feasible, but I'm not sure it is worth it at this point. My suggestion right now (and what I've done) is to back-out the 'w' typecode for the array module and just leave it as 'u' as before. I'll check in this change. -Travis From lists at cheimes.de Sat Oct 13 15:38:08 2007 From: lists at cheimes.de (Christian Heimes) Date: Sat, 13 Oct 2007 15:38:08 +0200 Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds In-Reply-To: <470FE91D.5010001@ieee.org> References: <470F9775.4080405@cheimes.de> <470FB45B.6060004@enthought.com> <470FB991.7010403@cheimes.de> <470FE91D.5010001@ieee.org> Message-ID: <4710CA40.7080705@cheimes.de> Travis Oliphant wrote: > I think it would be feasible, but I'm not sure it is worth it at this > point. My suggestion right now (and what I've done) is to back-out the > 'w' typecode for the array module and just leave it as 'u' as before. Thanks! I've seen that you've also checked in my typecodes addition to arraymodule.c Do you think it's worth backporting to 2.6? The table http://www.python.org/dev/peps/pep-3118/#additions-to-the-struct-string-syntax isn't exactly clear to me. I *guess* 'u' means UCS-2 on all platforms and builds of Python - even UCS-4 builds - and 'w' is only available on wide builds. I suggest that you place emphasis on the size to make the table unambiguous. I know that I'm nit picking but documentation should be crystal clear. ;) If I'm correct with my assumption about 'u' and 'w' your suggestion of a native 'U' could become in handy. Christian From qrczak at knm.org.pl Sat Oct 13 16:19:36 2007 From: qrczak at knm.org.pl (Marcin =?UTF-8?Q?=E2=80=98Qrczak=E2=80=99?= Kowalczyk) Date: Sat, 13 Oct 2007 16:19:36 +0200 Subject: [Python-3000] Array typecode 'w' vs. 'u' and UCS4 builds In-Reply-To: <4710CA40.7080705@cheimes.de> References: <470F9775.4080405@cheimes.de> <470FB45B.6060004@enthought.com> <470FB991.7010403@cheimes.de> <470FE91D.5010001@ieee.org> <4710CA40.7080705@cheimes.de> Message-ID: <1192285176.3643.2.camel@qrnik> Dnia 13-10-2007, So o godzinie 15:38 +0200, Christian Heimes pisze: > If I'm correct with my assumption about 'u' and 'w' your suggestion of a > native 'U' could become in handy. Wouldn't it be nicer if 'u' and 'U' corresponded to \uxxxx and \Uxxxxxxxx, i.e. UCS-2 and UCS-4, and something else was used for the native width? -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From jimjjewett at gmail.com Mon Oct 15 15:57:20 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 15 Oct 2007 09:57:20 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> Message-ID: On 10/12/07, Guido van Rossum wrote: > On 10/12/07, Gregory P. Smith wrote: > > > > - add missing methods to PyBytes (for list, see the PEP and compare to > > > > what's already there) > > As I work on these.. Should the mutable PyBytes_ (buffer) objects implement > > the following methods inplace and return an additional reference to self? > > .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), .title(), > > .upper(), .zfill() > No... That would be a huge trap to fall in at all sorts of occasions. So would returning a different object. I expect a mutation operation on an explicitly mutable object to mutate the object, instead of creating something new. If it returns a new one, I can imagine doing something like: obj.inqueue=bytesbuffer(100) obj.inqueue.lower() # oh, wait, that didn't really do anything after all... if obj.inqueue[:4] == b"http": # works on my *regular* input... Maybe the answer is "don't do that", and to only do this sort of processing before it goes in the buffer or after it comes out, but ... it still looks like a major gotcha. -jJ From guido at python.org Mon Oct 15 16:49:31 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2007 07:49:31 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> Message-ID: On 10/15/07, Jim Jewett wrote: > On 10/12/07, Guido van Rossum wrote: > > On 10/12/07, Gregory P. Smith wrote: > > > > > - add missing methods to PyBytes (for list, see the PEP and compare to > > > > > what's already there) > > > > As I work on these.. Should the mutable PyBytes_ (buffer) objects implement > > > the following methods inplace and return an additional reference to self? > > > > .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), .title(), > > > .upper(), .zfill() > > > No... That would be a huge trap to fall in at all sorts of occasions. > > So would returning a different object. I expect a mutation operation > on an explicitly mutable object to mutate the object, instead of > creating something new. > > If it returns a new one, I can imagine doing something like: > > obj.inqueue=bytesbuffer(100) > obj.inqueue.lower() # oh, wait, that didn't really do anything > after all... > if obj.inqueue[:4] == b"http": # works on my *regular* input... > > Maybe the answer is "don't do that", and to only do this sort of > processing before it goes in the buffer or after it comes out, but ... > it still looks like a major gotcha. Since these methods with these very names already exist for strings and return new values there, I don't see the gotcha unless you never use strings. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jjb5 at cornell.edu Mon Oct 15 18:20:24 2007 From: jjb5 at cornell.edu (Joel Bender) Date: Mon, 15 Oct 2007 12:20:24 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> Message-ID: <47139348.5090006@cornell.edu> Speaking from the protocol encoding/decoding view, and one where a buffer is very similar to a list of small integers... >> Also what about .replace() and .translate()? > >> If they are not done in place should they return a new buffer (PyBytes_) >> object or a bytes (PyString_) object? [i'd say a buffer (PyBytes_)] > > They should return the same type as 'self'. My preference would be to do the work in place and return None, just like sorting a list, reversing a list, appending to a list, etc. >> Alos if not, should we add additional .ireplace() .ilower() etc.. methods to >> the mutable buffer (PyBytes_)? There are speed advantages to doing many of >> those in place rather than a data copy. > > I'm not sure I see the use case where this matters all that much > though. Let's say not, if only because it's not in the PEP. ;-) I would appreciate it if these functions were list-like and not tuple-like. In extending buffers to support more structure encoding and decoding functions, it would be nice to carry the expectation that these extensions mutate the buffer and I can leverage the built-in functionality to do that. I am but a small voice in the chorus. Joel From guido at python.org Mon Oct 15 18:28:12 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2007 09:28:12 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <47139348.5090006@cornell.edu> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> <47139348.5090006@cornell.edu> Message-ID: On 10/15/07, Joel Bender wrote: > Speaking from the protocol encoding/decoding view, and one where a > buffer is very similar to a list of small integers... > > >> Also what about .replace() and .translate()? > > > >> If they are not done in place should they return a new buffer (PyBytes_) > >> object or a bytes (PyString_) object? [i'd say a buffer (PyBytes_)] > > > > They should return the same type as 'self'. > > My preference would be to do the work in place and return None, just > like sorting a list, reversing a list, appending to a list, etc. Then propose new APIs that don't have the same names as the existing ones, which are amongst the most well-known APIs in all of Python. > >> Alos if not, should we add additional .ireplace() .ilower() etc.. methods to > >> the mutable buffer (PyBytes_)? There are speed advantages to doing many of > >> those in place rather than a data copy. > > > > I'm not sure I see the use case where this matters all that much > > though. Let's say not, if only because it's not in the PEP. ;-) > > I would appreciate it if these functions were list-like and not > tuple-like. In extending buffers to support more structure encoding and > decoding functions, it would be nice to carry the expectation that these > extensions mutate the buffer and I can leverage the built-in > functionality to do that. The existing mutable PyBytes type (which will be known as 'buffer' in 3.0a2 and beyond) *does* have a number of list-like methods: .append(), .insert(), .extend(). Also += will work in place. And of course slice assignment works. For structure encoding/decoding, please have a look at the existing APIs in the struct module and let us know what's missing. > I am but a small voice in the chorus. There is no rule that PEPs need to be written by senior developers! All you need to be able to do in order to *write* a good PEP is to *listen* well. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Mon Oct 15 18:33:48 2007 From: lists at cheimes.de (Christian Heimes) Date: Mon, 15 Oct 2007 18:33:48 +0200 Subject: [Python-3000] Should PyString (new bytes type) accept strings with encoding? Message-ID: <4713966C.8080904@cheimes.de> I'm working on the renaming of str8 -> bytes and bytes -> buffer. PyBytes (old bytes, new buffer) can take a string together with an encoding and an optional error argument: >>> bytes(source="abc", encoding="ascii", errors="replace") b'abc' >>> str(b"abc", encoding="ascii") 'abc' IMO this should work >>> str8("abc", encoding="ascii") Traceback (most recent call last): File "", line 1, in TypeError: 'encoding' is an invalid keyword argument for this function And this should break with a type error >>> str8("abc") b'abc' PyString' constructor doesn't take strings (PyUnicode). I like to add the support for strings to it. It makes the API of str, bytes and buffer consistent and fixes a *lot* of broken code and tests. Are you confused by the name changes? I'm sometimes confused so I made a table: c name | old | new | repr ------------------------------------------- PyUnicode | str | - | '' PyString | str8 | bytes | b'' PyBytes | bytes | buffer | buffer(b'') Christian From guido at python.org Mon Oct 15 18:49:17 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2007 09:49:17 -0700 Subject: [Python-3000] Should PyString (new bytes type) accept strings with encoding? In-Reply-To: <4713966C.8080904@cheimes.de> References: <4713966C.8080904@cheimes.de> Message-ID: On 10/15/07, Christian Heimes wrote: > I'm working on the renaming of str8 -> bytes and bytes -> buffer. > PyBytes (old bytes, new buffer) can take a string together with an > encoding and an optional error argument: > > > >>> bytes(source="abc", encoding="ascii", errors="replace") > b'abc' > >>> str(b"abc", encoding="ascii") > 'abc' Correct. > IMO this should work > >>> str8("abc", encoding="ascii") > Traceback (most recent call last): > File "", line 1, in > TypeError: 'encoding' is an invalid keyword argument for this function Yes, this should work. (I thought it already did but was wrong. ;-) > And this should break with a type error > >>> str8("abc") > b'abc' Correct. > PyString' constructor doesn't take strings (PyUnicode). I like to add > the support for strings to it. It makes the API of str, bytes and buffer > consistent and fixes a *lot* of broken code and tests. Right. > Are you confused by the name changes? I'm sometimes confused so I made a > table: > > c name | old | new | repr > ------------------------------------------- > PyUnicode | str | - | '' > PyString | str8 | bytes | b'' > PyBytes | bytes | buffer | buffer(b'') I'd rewrite this as follows: C name | 2.x | 3.0a1 | 3.0a2 | ----------+--------------+------------+---------------------+ PyUnicode | unicode u"" | str "" | str "" | PyString | str "" | str8 s"" | bytes "" | PyBytes | N/A | bytes b"" | buffer buffer(b"") | ----------+--------------+------------+---------------------+ Seems worth adding to the PEP. I'll do that. -- --Guido van Rossum (home page: http://www.python.org/~guido/) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/b5013dcf/attachment.htm From steven.bethard at gmail.com Mon Oct 15 18:54:43 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 15 Oct 2007 10:54:43 -0600 Subject: [Python-3000] Should PyString (new bytes type) accept strings with encoding? In-Reply-To: References: <4713966C.8080904@cheimes.de> Message-ID: On 10/15/07, Guido van Rossum wrote: > C name | 2.x | 3.0a1 | 3.0a2 | > ----------+--------------+------------+---------------------+ > PyUnicode | unicode u"" | str "" | str "" | > PyString | str "" | str8 s"" | bytes "" | > PyBytes | N/A | bytes b"" | buffer buffer(b"") | > ----------+--------------+------------+---------------------+ That "" beside bytes in the 3.0a2 column should be b"" (that is, with a "b" prefix), right? STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Mon Oct 15 18:59:30 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2007 09:59:30 -0700 Subject: [Python-3000] Should PyString (new bytes type) accept strings with encoding? In-Reply-To: References: <4713966C.8080904@cheimes.de> Message-ID: Correct. Sorry. Here's an improved table that I'm also adding to the PEP: C name | 2.x repr | 3.0a1 repr | 3.0a2 repr -------------+-------------+------------+------------------- PyUnicode | unicode u"" | str "" | str "" PyString | str "" | str8 s"" | bytes b"" PyBytes | N/A | bytes b"" | buffer buffer(b"") PyBuffer | buffer N/A | buffer N/A | N/A PyMemoryView | N/A | N/A | memoryview N/A-------------+-------------+------------+------------------- --Guido On 10/15/07, Steven Bethard wrote: > > On 10/15/07, Guido van Rossum wrote: > > C name | 2.x | 3.0a1 | 3.0a2 | > > ----------+--------------+------------+---------------------+ > > PyUnicode | unicode u"" | str "" | str "" | > > PyString | str "" | str8 s"" | bytes "" | > > PyBytes | N/A | bytes b"" | buffer buffer(b"") | > > ----------+--------------+------------+---------------------+ > > That "" beside bytes in the 3.0a2 column should be b"" (that is, with > a "b" prefix), right? > > STeVe > -- > I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a > tiny blip on the distant coast of sanity. > --- Bucky Katt, Get Fuzzy > -- --Guido van Rossum (home page: http://www.python.org/~guido/) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/65f4d859/attachment-0001.htm From christian at cheimes.de Mon Oct 15 18:39:23 2007 From: christian at cheimes.de (Christian Heimes) Date: Mon, 15 Oct 2007 18:39:23 +0200 Subject: [Python-3000] Should PyString (new bytes type) accept strings with encoding? In-Reply-To: <4713966C.8080904@cheimes.de> References: <4713966C.8080904@cheimes.de> Message-ID: <471397BB.5070806@cheimes.de> Doh, the answer is in the PEP. Please ignore the other mail :) http://www.python.org/dev/peps/pep-3137/#constructors Christian From tjreedy at udel.edu Mon Oct 15 18:41:26 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 15 Oct 2007 12:41:26 -0400 Subject: [Python-3000] PEP 3137 plan of attack References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com><52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> Message-ID: "Guido van Rossum" wrote in message news:ca471dc20710150749y70ba12cfmadf1c59974c61926 at mail.gmail.com... | > > > As I work on these.. Should the mutable PyBytes_ (buffer) objects implement | > > > the following methods inplace and return an additional reference to self? | > | > > > .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), .title(), | > > > .upper(), .zfill() | > | > > No... That would be a huge trap to fall in at all sorts of occasions. At this point, I though your objection was to returning the buffer instead of None, as with list mutations, and for the same reason. But admittedly, some people do not like this feature of lists. | > So would returning a different object. I expect a mutation operation | > on an explicitly mutable object to mutate the object, instead of | > creating something new. So was I. | Since these methods with these very names already exist for strings | and return new values there, I don't see the gotcha unless you never | use strings. The real question is what is more useful? I would think that being able to edit in place would be a reason to use a buffer rather than (immutable) bytes. tjr From greg at krypto.org Mon Oct 15 19:55:43 2007 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 15 Oct 2007 10:55:43 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> <47139348.5090006@cornell.edu> Message-ID: <52dc1c820710151055q6a462b87m4948e36aecb1f26e@mail.gmail.com> > > >> Also what about .replace() and .translate()? > > > > > >> If they are not done in place should they return a new buffer > (PyBytes_) > > >> object or a bytes (PyString_) object? [i'd say a buffer (PyBytes_)] > > > > > > They should return the same type as 'self'. > > > > My preference would be to do the work in place and return None, just > > like sorting a list, reversing a list, appending to a list, etc. > > Then propose new APIs that don't have the same names as the existing > ones, which are amongst the most well-known APIs in all of Python. Agreed, thats why I suggest new method names with an 'i' in front for inplace. Anyways I'll be done with my patch to add the copying versions of the methods later today. Stay tuned. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/5a876dcd/attachment.htm From greg at krypto.org Mon Oct 15 19:58:15 2007 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 15 Oct 2007 10:58:15 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> Message-ID: <52dc1c820710151058y7b605579y82c3082146b3b220@mail.gmail.com> On 10/15/07, Terry Reedy wrote: > > > "Guido van Rossum" wrote in message > news:ca471dc20710150749y70ba12cfmadf1c59974c61926 at mail.gmail.com... > | > > > As I work on these.. Should the mutable PyBytes_ (buffer) objects > implement > | > > > the following methods inplace and return an additional reference > to > self? > | > > | > > > .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), > .title(), > | > > > .upper(), .zfill() > | > > | > > No... That would be a huge trap to fall in at all sorts of > occasions. > > At this point, I though your objection was to returning the buffer instead > of None, as with list mutations, and for the same reason. But admittedly, > some people do not like this feature of lists. > > | > So would returning a different object. I expect a mutation operation > | > on an explicitly mutable object to mutate the object, instead of > | > creating something new. > > So was I. > > | Since these methods with these very names already exist for strings > | and return new values there, I don't see the gotcha unless you never > | use strings. > > The real question is what is more useful? I would think that being able > to > edit in place would be a reason to use a buffer rather than (immutable) > bytes. > > tjr I agree, thats a benefit of a mutable object. But I think the point about not reusing the names with a different behavior is valid so that some code can be written to operate on objects with duck type without having to know if its mutable or not. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/b505de4e/attachment.htm From jimjjewett at gmail.com Mon Oct 15 20:11:35 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 15 Oct 2007 14:11:35 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710151058y7b605579y82c3082146b3b220@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> <52dc1c820710151058y7b605579y82c3082146b3b220@mail.gmail.com> Message-ID: On 10/15/07, Gregory P. Smith wrote: > On 10/15/07, Terry Reedy wrote: > > ...I would think that being able to edit in place would be a reason > > to use a buffer rather than (immutable) bytes. > I agree, thats a benefit of a mutable object. But I think the point about > not reusing the names with a different behavior is valid so that some > code can be written to operate on objects with duck type without > having to know if its mutable or not. I thought that was the reason to return self instead of None. If returning the original (but mutated) buffer is a problem, then there is already a problem, because someone else could already mutate the original. (Also note that for duck-typing, it should be OK if the new result object is always immutable, since you have to handle that case anyhow.) -jJ From luke.stebbing at gmail.com Mon Oct 15 20:33:46 2007 From: luke.stebbing at gmail.com (Luke Stebbing) Date: Mon, 15 Oct 2007 11:33:46 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> <52dc1c820710151058y7b605579y82c3082146b3b220@mail.gmail.com> Message-ID: On 10/15/07, Jim Jewett wrote: > If returning the original (but mutated) buffer is a problem, then > there is already a problem, because someone else could already mutate > the original. > > (Also note that for duck-typing, it should be OK if the new result > object is always immutable, since you have to handle that case > anyhow.) Changing the contract of a function can really mess with duck-typing. If you write a function that internally creates a lowered copy of a variable (for comparison, say), suddenly you're unintentionally lowering your argument in-place. Even returning an immutable result object is a problem, because your contract changes from "I return a lowered, rjusted copy of my argument" to "I return a lowered rjusted copy of my argument that -- oops -- is immutable now if it wasn't before". Luke From rhamph at gmail.com Mon Oct 15 20:40:29 2007 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 15 Oct 2007 12:40:29 -0600 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> Message-ID: On 10/15/07, Guido van Rossum wrote: > On 10/15/07, Jim Jewett wrote: > > On 10/12/07, Guido van Rossum wrote: > > > On 10/12/07, Gregory P. Smith wrote: > > > > > > - add missing methods to PyBytes (for list, see the PEP and compare to > > > > > > what's already there) > > > > > > As I work on these.. Should the mutable PyBytes_ (buffer) objects implement > > > > the following methods inplace and return an additional reference to self? > > > > > > .capitalize(), .center(), .expandtabs(), .rjust(), .swapcase(), .title(), > > > > .upper(), .zfill() > > > > > No... That would be a huge trap to fall in at all sorts of occasions. > > > > So would returning a different object. I expect a mutation operation > > on an explicitly mutable object to mutate the object, instead of > > creating something new. > > > > If it returns a new one, I can imagine doing something like: > > > > obj.inqueue=bytesbuffer(100) > > obj.inqueue.lower() # oh, wait, that didn't really do anything > > after all... > > if obj.inqueue[:4] == b"http": # works on my *regular* input... > > > > Maybe the answer is "don't do that", and to only do this sort of > > processing before it goes in the buffer or after it comes out, but ... > > it still looks like a major gotcha. > > Since these methods with these very names already exist for strings > and return new values there, I don't see the gotcha unless you never > use strings. Maybe .lower() should return immutable bytes, rather than mutable buffer? For the use cases I can imagine this'd still work correctly, and fits better with why it makes a copy. buffer is all about operating in-place, so any copy immediately doesn't fit the buffer concept. obj.inqueue = bytesbuffer(100) # replaces existing buffer contents. Temp copy need not be mutable obj.inqueue[:] = obj.inqueue.lower() if obj.inqueue[:4] == b"http": obj.inqueue = bytesbuffer(100) if obj.inqueue[:4].lower() == b"http": # compares bytes with bytes -- Adam Olsen, aka Rhamphoryncus From luke.stebbing at gmail.com Mon Oct 15 20:42:48 2007 From: luke.stebbing at gmail.com (Luke Stebbing) Date: Mon, 15 Oct 2007 11:42:48 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> Message-ID: On 10/15/07, Jim Jewett wrote: > So would returning a different object. I expect a mutation operation > on an explicitly mutable object to mutate the object, instead of > creating something new. > > If it returns a new one, I can imagine doing something like: > > obj.inqueue=bytesbuffer(100) > obj.inqueue.lower() # oh, wait, that didn't really do anything > after all... > if obj.inqueue[:4] == b"http": # works on my *regular* input... > > Maybe the answer is "don't do that", and to only do this sort of > processing before it goes in the buffer or after it comes out, but ... > it still looks like a major gotcha. I expect something spelled "lower" to try and transform an object in-place, period. Too bad changing it to "lowered" would be such a royal pain. Luke From guido at python.org Mon Oct 15 21:32:16 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2007 12:32:16 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> <52dc1c820710151058y7b605579y82c3082146b3b220@mail.gmail.com> Message-ID: I am not going to explain this further if you still don't get it. These functions should not modify their argument, and return a copy of the same type as the original. I'm fine with new APIs that perform similar things in-place. --Guido On 10/15/07, Jim Jewett wrote: > On 10/15/07, Gregory P. Smith wrote: > > On 10/15/07, Terry Reedy wrote: > > > > ...I would think that being able to edit in place would be a reason > > > to use a buffer rather than (immutable) bytes. > > > I agree, thats a benefit of a mutable object. But I think the point about > > not reusing the names with a different behavior is valid so that some > > code can be written to operate on objects with duck type without > > having to know if its mutable or not. > > I thought that was the reason to return self instead of None. > > If returning the original (but mutated) buffer is a problem, then > there is already a problem, because someone else could already mutate > the original. > > (Also note that for duck-typing, it should be OK if the new result > object is always immutable, since you have to handle that case > anyhow.) > > -jJ > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From cjw at sympatico.ca Mon Oct 15 23:16:08 2007 From: cjw at sympatico.ca (Colin J. Williams) Date: Mon, 15 Oct 2007 17:16:08 -0400 Subject: [Python-3000] bytes vs array.array vs numpy.array In-Reply-To: <8f01efd00710121237v576623adm9a4e36af37ffe6bc@mail.gmail.com> References: <8f01efd00710121237v576623adm9a4e36af37ffe6bc@mail.gmail.com> Message-ID: skip at pobox.com wrote: > Nick> I wouldn't mind seeing some iteration-in-C bit-bashing operations > Nick> in there eventually... > > Nick> data = bytes([x & 0x1F for x in orig_data]) > > This begins to make it look what you want is array.array or nump.array. > Python's arrays don't support bitwise operations either, but numpy's do. > How much overlap is there between the three types? Does it make sense to > consider that canonical underlying array type now (or in the near future, > sometime before the release of 3.0 final)? > > Skip I am a lurker here, rather than a contributer but I hope that this idea will be explored further. A good canonical multi-dimensional array is needed. NumPy provides a class which, in addition to serving various numeric needs, also provides for a multi-dimensional array where the elements can be of some class/types. It would be good if array.Array could create a multidimensional array, where each element would be an instance of dtype, which could be any known type or class The Array could have a signature something like: Array(shape, type, initializer) where: shape is a tuple, giving the dimensionality (or an integer for a single dimension) dtype is a Python type or class initializer is a Python expression which can be converted into an array of dtype, where dtype is any known type or class. Thus, Array(5, float, [0, 1, 2, 3, 4]) would have the same effect as the current array.array('f', [0., 1., 2., 3., 4.]) To allow for the full range of data types provided by array.array, it would be necessary to define a few additional Python data types. The aim here is to use meaningful mnemonics, rather than obscure letter codes. Colin W. From guido at python.org Tue Oct 16 00:31:37 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2007 15:31:37 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: There's one thing that I forgot to add to PEP 3137. It's the removal of the basestring type. I think this is a reasonable thing to do. Christian Heimes has a patch that does this cleanly. Anyone objecting, please speak up now! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Oct 16 01:04:54 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2007 16:04:54 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/15/07, Guido van Rossum wrote: > There's one thing that I forgot to add to PEP 3137. It's the removal > of the basestring type. I think this is a reasonable thing to do. > Christian Heimes has a patch that does this cleanly. Anyone objecting, > please speak up now! And, quite separately, we will need a common base type for bytes and buffer. I think that should be an ABC in the collections module though, which simply registers bytes and buffer. Any suggestions for a name? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at krypto.org Tue Oct 16 02:04:59 2007 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 15 Oct 2007 17:04:59 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <52dc1c820710151704ra5789b9q7fcb59760bdb7479@mail.gmail.com> while trying to figure out what to update the common method docstrings to say I've come up with terms such as 'byte string' or 'byte buffer' but none of those are extra appealing to me to turn into an ABC name. other thoughts? On 10/15/07, Guido van Rossum wrote: > > On 10/15/07, Guido van Rossum wrote: > > There's one thing that I forgot to add to PEP 3137. It's the removal > > of the basestring type. I think this is a reasonable thing to do. > > Christian Heimes has a patch that does this cleanly. Anyone objecting, > > please speak up now! > > And, quite separately, we will need a common base type for bytes and > buffer. I think that should be an ABC in the collections module > though, which simply registers bytes and buffer. Any suggestions for a > name? > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/3929a99a/attachment.htm From greg at krypto.org Tue Oct 16 02:10:24 2007 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 15 Oct 2007 17:10:24 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710151055q6a462b87m4948e36aecb1f26e@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> <47139348.5090006@cornell.edu> <52dc1c820710151055q6a462b87m4948e36aecb1f26e@mail.gmail.com> Message-ID: <52dc1c820710151710g3edf3b3cs61a010c85e3b35b1@mail.gmail.com> > Anyways I'll be done with my patch to add the copying versions of the > methods later today. Stay tuned. > The PyBytes methods from PEP3137 have been implemented. Review as desired. http://bugs.python.org/issue1261 If its good as is, let me know and I can check that in if you don't want to yourself. I believe there are some more opportunities for code sharing between PyString and PyBytes both in methods already existing in stringobject and bytesobject and in some of the Objects/stringlib/transmogrify.h code that this patch adds. I tried to share as much as possible to avoid both bloat and most importantly multiple copies of the same algorithms. That could be considered additional cleanup or optimization. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/7d00db1c/attachment.htm From greg.ewing at canterbury.ac.nz Tue Oct 16 02:26:27 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2007 13:26:27 +1300 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> Message-ID: <47140533.9040700@canterbury.ac.nz> Jim Jewett wrote: > On 10/12/07, Guido van Rossum wrote: > >>On 10/12/07, Gregory P. Smith wrote: >> >>>Should the mutable PyBytes_ (buffer) objects implement >>>the following methods inplace and return an additional reference to self? If they're to work in-place, they should return None. -- Greg From brett at python.org Tue Oct 16 03:45:30 2007 From: brett at python.org (Brett Cannon) Date: Mon, 15 Oct 2007 18:45:30 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/15/07, Guido van Rossum wrote: > On 10/15/07, Guido van Rossum wrote: > > There's one thing that I forgot to add to PEP 3137. It's the removal > > of the basestring type. I think this is a reasonable thing to do. > > Christian Heimes has a patch that does this cleanly. Anyone objecting, > > please speak up now! > > And, quite separately, we will need a common base type for bytes and > buffer. I think that should be an ABC in the collections module > though, which simply registers bytes and buffer. Any suggestions for a > name? BinaryData, RawData. I use both 'binary' and 'raw' in my variable names when I have used bytes so that's why those names pop into my head. -Brett From guido at python.org Tue Oct 16 04:12:53 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2007 19:12:53 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: ByteSequence. On 10/15/07, Brett Cannon wrote: > On 10/15/07, Guido van Rossum wrote: > > On 10/15/07, Guido van Rossum wrote: > > > There's one thing that I forgot to add to PEP 3137. It's the removal > > > of the basestring type. I think this is a reasonable thing to do. > > > Christian Heimes has a patch that does this cleanly. Anyone objecting, > > > please speak up now! > > > > And, quite separately, we will need a common base type for bytes and > > buffer. I think that should be an ABC in the collections module > > though, which simply registers bytes and buffer. Any suggestions for a > > name? > > BinaryData, RawData. I use both 'binary' and 'raw' in my variable > names when I have used bytes so that's why those names pop into my > head. > > -Brett > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Tue Oct 16 08:01:43 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2007 19:01:43 +1300 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> <52dc1c820710151058y7b605579y82c3082146b3b220@mail.gmail.com> Message-ID: <471453C7.1010509@canterbury.ac.nz> Jim Jewett wrote: > I thought that was the reason to return self instead of None. That would be even more misleading, because you would get no warning that you had called a mutating method when you thought you were calling a non-mutating one. This is the reason that all the existing mutating methods return None instead of self. It's safer that way. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 16 08:04:54 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2007 19:04:54 +1300 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710121455x1d3a52d8gf30bfab102cc9df6@mail.gmail.com> Message-ID: <47145486.4050505@canterbury.ac.nz> Luke Stebbing wrote: > I expect something spelled "lower" to try and transform an object > in-place, period. Too bad changing it to "lowered" would be such a > royal pain. Yes, those methods should probably have been called "lowered", "captitalized", etc. from the beginning, but the time machine would need an upgrade to make that big a history change. :-( -- Greg From greg at krypto.org Tue Oct 16 08:36:30 2007 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 15 Oct 2007 23:36:30 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> Message-ID: <52dc1c820710152336p54d5375rf58c6b9863b1b16a@mail.gmail.com> On 10/8/07, Gregory P. Smith wrote: > > > - add missing methods to PyBytes (for list, see the PEP and compare to > > what's already there) > > Committed revision 58493. (closes issue1261). fwiw - On py3k head on the x86 ubuntu feisty box i used to do the commit the following tests on the py3k branch were failing both before and after this change. test_cProfile test_doctest test_email test_profile I didn't break them. :) -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071015/bcea2f46/attachment.htm From brett at python.org Tue Oct 16 09:16:29 2007 From: brett at python.org (Brett Cannon) Date: Tue, 16 Oct 2007 00:16:29 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710152336p54d5375rf58c6b9863b1b16a@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710152336p54d5375rf58c6b9863b1b16a@mail.gmail.com> Message-ID: On 10/15/07, Gregory P. Smith wrote: > > On 10/8/07, Gregory P. Smith wrote: > > > > > > > > > - add missing methods to PyBytes (for list, see the PEP and compare to > > > what's already there) > > > > Committed revision 58493. (closes issue1261). > > fwiw - On py3k head on the x86 ubuntu feisty box i used to do the commit the > following tests on the py3k branch were failing both before and after this > change. > > test_cProfile test_doctest test_email test_profile > > I didn't break them. :) Running test_doctest really quickly (to make sure I didn't break it =) shows no breakage on a build from r58479. -Brett From goodger at python.org Tue Oct 16 15:30:50 2007 From: goodger at python.org (David Goodger) Date: Tue, 16 Oct 2007 09:30:50 -0400 Subject: [Python-3000] PyCon 2008: Call for Talk & Tutorial Proposals In-Reply-To: <47140763.30009@python.org> References: <47140763.30009@python.org> Message-ID: <4335d2c40710160630p1f94e67am11c504f15ddeff42@mail.gmail.com> Proposals for PyCon 2008 talks & tutorials are now being accepted. The deadline for proposals is November 16. PyCon 2008 will be held in Chicago, Illinois, USA, from March 13-20. Please see the full announcement here: http://pycon.blogspot.com/2007/10/call-for-talk-tutorial-proposals.html -- David Goodger From lists at cheimes.de Tue Oct 16 15:45:56 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 16 Oct 2007 15:45:56 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710152336p54d5375rf58c6b9863b1b16a@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710152336p54d5375rf58c6b9863b1b16a@mail.gmail.com> Message-ID: Gregory P. Smith wrote: > fwiw - On py3k head on the x86 ubuntu feisty box i used to do the commit the > following tests on the py3k branch were failing both before and after this > change. > > test_cProfile test_doctest test_email test_profile > > I didn't break them. :) They are broken on Ubuntu Linux, i386 and UCS-4 build for me, too. The failures in doctest, profile and cProfile are caused by additional calls to utf_8_decode. They were introduced by the patch from me and Alexandre but we don't know how to fix them. I've a fix for one of the two failures in test_email in one of my pending patches. Christian From guido at python.org Tue Oct 16 18:55:51 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Oct 2007 09:55:51 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/15/07, Guido van Rossum wrote: > There's one thing that I forgot to add to PEP 3137. It's the removal > of the basestring type. I think this is a reasonable thing to do. > Christian Heimes has a patch that does this cleanly. Anyone objecting, > please speak up now! No-one spoke up. I'll check in Christian's patch now, and add this to the PEP. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Tue Oct 16 19:06:59 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 16 Oct 2007 19:06:59 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <4714EFB3.303@cheimes.de> Guido van Rossum wrote: > No-one spoke up. I'll check in Christian's patch now, and add this to the PEP. Thanks! The fixer for basestr -> str is available at http://bugs.python.org/file8548 Christian From lists at cheimes.de Tue Oct 16 19:06:20 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 16 Oct 2007 19:06:20 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: Guido van Rossum wrote: > No-one spoke up. I'll check in Christian's patch now, and add this to the PEP. Thanks! The fixer for basestr -> str is available at http://bugs.python.org/file8548 Christian From dwheeler at dwheeler.com Tue Oct 16 20:41:14 2007 From: dwheeler at dwheeler.com (David A. Wheeler) Date: Tue, 16 Oct 2007 14:41:14 -0400 (EDT) Subject: [Python-3000] Add "generalized boolean" as ABC to PEP 3119 Message-ID: Hi, I'm a Python user who likes much in the upcoming Python 3000. I wish you well! I have a few comments, though, that I hope are constructive. Guido asked me to repost them to this mailing list for discussion. I'll send my different comments as separate messages, so that they can be easily discussed separately. So... In PEP 3119 (Abstract Base Classes): I suggest adding an ABC for a "generalized bool" (perhaps name it Gbool?). Any class defining __bool__ (formerly __nonzero__), or one implementing Sized (which implement __len__), would be a generalized boolean. (Well, unless __len__ is no longer auto-called if there's no __bool__; if there's no auto-call, then I think just __bool__ would be checked, similar to how Sized works). All numbers and collections are generalized bools, obviously; many user-created classes will NOT be generalized bools. Many functions accept generalized bools, not strictly bools, and it'd be very nice to be able to explicitly _denote_ that in a standard way. --- David A. Wheeler From dwheeler at dwheeler.com Tue Oct 16 20:43:05 2007 From: dwheeler at dwheeler.com (David A. Wheeler) Date: Tue, 16 Oct 2007 14:43:05 -0400 (EDT) Subject: [Python-3000] Add python-3000-like print function to python 2.6 Message-ID: In Python 2.6, could some print FUNCTION be added to the builtins, using a different name than "print" but with the Python 3000 semantics? Call it printfunc or whatever. Python 3000 is undergoing much pain so that print can become a function. How about making those benefits available sooner than 3.0, so that we can use them earlier? Obviously people can create their own such function, but having a STANDARD name for it would mean that 2to3 could easily automate that translation. Plus, it'd help people get used to the idea of a printing _function_. --- David A. Wheeler From dwheeler at dwheeler.com Tue Oct 16 20:45:09 2007 From: dwheeler at dwheeler.com (David A. Wheeler) Date: Tue, 16 Oct 2007 14:45:09 -0400 (EDT) Subject: [Python-3000] Please re-add __cmp__ to python 3000 Message-ID: I agree with Collin Winter: losing __cmp__ is a loss (see http://oakwinter.com/code/). Yes, it's possible to write all the comparison operations, but I think it's _clearer_ to create a single low-level operator that handles ALL the comparison operators. It also avoids many mistakes; once you get that ONE operator right, ALL comparisons are right. I think the python 2 way is better: individual operations for the cases where you want to handle each case specially, and a single __cmp__ function that is a simple way to handle comparisons all at once. --- David A. Wheeler From lists at cheimes.de Tue Oct 16 21:27:21 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 16 Oct 2007 21:27:21 +0200 Subject: [Python-3000] Add python-3000-like print function to python 2.6 In-Reply-To: References: Message-ID: David A. Wheeler wrote: > In Python 2.6, could some print FUNCTION be added to the builtins, using a different name than "print" but with the Python 3000 semantics? Call it printfunc or whatever. I like xprint(). It follows the example of range/xrange, it's short, fast to type and easy to remember. Neither google nor find -name \*.py | xargs grep xprint revealed a method xprint. Christian From steven.bethard at gmail.com Tue Oct 16 21:34:09 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Tue, 16 Oct 2007 13:34:09 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: Message-ID: On 10/16/07, David A. Wheeler wrote: > I agree with Collin Winter: losing __cmp__ is a loss (see http://oakwinter.com/code/). > > Yes, it's possible to write all the comparison operations, but I think > it's _clearer_ to create a single low-level operator that handles ALL > the comparison operators. It also avoids many mistakes; once you > get that ONE operator right, ALL comparisons are right. I think the > python 2 way is better: individual operations for the cases where you > want to handle each case specially, and a single __cmp__ function > that is a simple way to handle comparisons all at once. Why can't this just be supplied with a mixin? Here's a recipe providing the appropriate mixins if you want to define a __key__ function: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403 Presumably, you could do a very similar thing for __cmp__ if you wanted to use it. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Tue Oct 16 22:27:22 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Oct 2007 13:27:22 -0700 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: Message-ID: On 10/16/07, David A. Wheeler wrote: > I agree with Collin Winter: losing __cmp__ is a loss (see http://oakwinter.com/code/). > > Yes, it's possible to write all the comparison operations, but I think it's _clearer_ to create a single low-level operator that handles ALL the comparison operators. It also avoids many mistakes; once you get that ONE operator right, ALL comparisons are right. I think the python 2 way is better: individual operations for the cases where you want to handle each case specially, and a single __cmp__ function that is a simple way to handle comparisons all at once. Perhaps, but do note that __cmp__ is *higher* level than __eq__ etc. , not lower level. I'd be okay with code that detects the presence of _cmp__ and then automatically defines __eq__ etc. accordingly. Whether this should be default behavior or a mixin that you explicitly have to request I'm not sure. I'd be willing to entertain a PEP that clearly explains the motivation and puts forward a specific solution. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Oct 16 22:29:07 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Oct 2007 13:29:07 -0700 Subject: [Python-3000] Add python-3000-like print function to python 2.6 In-Reply-To: References: Message-ID: On 10/16/07, David A. Wheeler wrote: > In Python 2.6, could some print FUNCTION be added to the builtins, using a different name than "print" but with the Python 3000 semantics? Call it printfunc or whatever. > > Python 3000 is undergoing much pain so that print can become a function. How about making those benefits available sooner than 3.0, so that we can use them earlier? Obviously people can create their own such function, but having a STANDARD name for it would mean that 2to3 could easily automate that translation. Plus, it'd help people get used to the idea of a printing _function_. I expect this will happen. At the very least, you'll be able to just use 'print' for that function's name if you include from __future__ import print_function at the top of your module. Whether it's worth it to make the same function available under a different name that doesn't require such an import I'm not sure. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Tue Oct 16 22:53:08 2007 From: fdrake at acm.org (Fred Drake) Date: Tue, 16 Oct 2007 16:53:08 -0400 Subject: [Python-3000] Add python-3000-like print function to python 2.6 In-Reply-To: References: Message-ID: On Oct 16, 2007, at 4:29 PM, Guido van Rossum wrote: > I expect this will happen. At the very least, you'll be able to just > use 'print' for that function's name if you include > > from __future__ import print_function > > at the top of your module. Whether it's worth it to make the same > function available under a different name that doesn't require such an > import I'm not sure. This makes sense to me. Creating a new name for the function doesn't add anything, IMO: to use it I need to "dirty" my code wherever I print, using the __future__ import only dirties an isolated spot in a module that prints. Much better, and probably useful during the transitional period. -Fred -- Fred Drake From nnorwitz at gmail.com Tue Oct 16 22:59:15 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Tue, 16 Oct 2007 13:59:15 -0700 Subject: [Python-3000] Add python-3000-like print function to python 2.6 In-Reply-To: References: Message-ID: On 10/16/07, Fred Drake wrote: > On Oct 16, 2007, at 4:29 PM, Guido van Rossum wrote: > > I expect this will happen. At the very least, you'll be able to just > > use 'print' for that function's name if you include > > > > from __future__ import print_function > > > > at the top of your module. Whether it's worth it to make the same > > function available under a different name that doesn't require such an > > import I'm not sure. > > This makes sense to me. Creating a new name for the function doesn't > add anything, IMO: to use it I need to "dirty" my code wherever I > print, using the __future__ import only dirties an isolated spot in a > module that prints. Much better, and probably useful during the > transitional period. There's a patch for this too. http://bugs.python.org/issue1633807 n From brett at python.org Tue Oct 16 23:03:51 2007 From: brett at python.org (Brett Cannon) Date: Tue, 16 Oct 2007 14:03:51 -0700 Subject: [Python-3000] Add "generalized boolean" as ABC to PEP 3119 In-Reply-To: References: Message-ID: On 10/16/07, David A. Wheeler wrote: > Hi, I'm a Python user who likes much in the upcoming Python 3000. I wish you well! I have a few comments, though, that I hope are constructive. Guido asked me to repost them to this mailing list for discussion. I'll send my different comments as separate messages, so that they can be easily discussed separately. So... > > In PEP 3119 (Abstract Base Classes): I suggest adding an ABC for a "generalized bool" (perhaps name it Gbool?). That just makes me think it is a Google product. I would say Boolean is a fine name since the type is named bool, but that might be too close of a name. > > Any class defining __bool__ (formerly __nonzero__), or one implementing Sized (which implement __len__), would be a generalized boolean. (Well, unless __len__ is no longer auto-called if there's no __bool__; if there's no auto-call, then I think just __bool__ would be checked, similar to how Sized works). All numbers and collections are generalized bools, obviously; many user-created classes will NOT be generalized bools. Many functions accept generalized bools, not strictly bools, and it'd be very nice to be able to explicitly _denote_ that in a standard way. Seems fine by me. -Brett From guido at python.org Tue Oct 16 23:42:12 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Oct 2007 14:42:12 -0700 Subject: [Python-3000] Add "generalized boolean" as ABC to PEP 3119 In-Reply-To: References: Message-ID: On 10/16/07, David A. Wheeler wrote: > Hi, I'm a Python user who likes much in the upcoming Python 3000. I wish you well! I have a few comments, though, that I hope are constructive. Guido asked me to repost them to this mailing list for discussion. I'll send my different comments as separate messages, so that they can be easily discussed separately. So... > > In PEP 3119 (Abstract Base Classes): I suggest adding an ABC for a "generalized bool" (perhaps name it Gbool?). > > Any class defining __bool__ (formerly __nonzero__), or one implementing Sized (which implement __len__), would be a generalized boolean. (Well, unless __len__ is no longer auto-called if there's no __bool__; if there's no auto-call, then I think just __bool__ would be checked, similar to how Sized works). All numbers and collections are generalized bools, obviously; many user-created classes will NOT be generalized bools. Many functions accept generalized bools, not strictly bools, and it'd be very nice to be able to explicitly _denote_ that in a standard way. This sounds misguided to me. While it is true that some types can never be false, they can still be useful as a truth value: e.g. a parameter could be either a Widget object (assuming Widgets are never false) or None. This is used pretty commonly. So there is absolutely nothing to test for in the type of an object -- *every* object is usable as a "generalized boolean". It therefore becomes purely a matter of argument annotation, an area which is explicitly left open for experimentation by PEP 3107. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Wed Oct 17 00:06:01 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 17 Oct 2007 11:06:01 +1300 Subject: [Python-3000] Add "generalized boolean" as ABC to PEP 3119 In-Reply-To: References: Message-ID: <471535C9.7000500@canterbury.ac.nz> David A. Wheeler wrote: > Any class defining __bool__ (formerly __nonzero__), or one implementing > Sized (which implement __len__), would be a generalized boolean. Considering that *all* objects have at least an implicit implementation of __bool__ (that tests against None) I'm not sure that this would be a meaningful or useful concept. What use cases do you have in mind for this? -- Greg From jimjjewett at gmail.com Wed Oct 17 02:40:01 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 16 Oct 2007 20:40:01 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/15/07, Guido van Rossum wrote: > There's one thing that I forgot to add to PEP 3137. It's the removal > of the basestring type. I think this is a reasonable thing to do. > Christian Heimes has a patch that does this cleanly. Anyone objecting, > please speak up now! I don't like replacing the abstract basestring with a concrete type in isinstance checks. I agree that the right answer is something in ABC, which may not need to be a builtin. Does tearing out basestring before adding that "something" (String?) cause any problems? -jJ From dwheeler at dwheeler.com Wed Oct 17 05:47:41 2007 From: dwheeler at dwheeler.com (David A. Wheeler) Date: Tue, 16 Oct 2007 23:47:41 -0400 (EDT) Subject: [Python-3000] Add python-3000-like print function to python 2.6 Message-ID: Guido van Rossum wrote: > > I expect this will happen. At the very least, you'll be able to just > > use 'print' for that function's name if you include > > from __future__ import print_function Neal Norwitz wrote: > There's a patch for this too. http://bugs.python.org/issue1633807 Excellent! I like the "from __future__..." approach better than what I'd originally proposed. If that is the plan for Python 2.6 (and I hope it is), can I appeal to someone to modify PEP 3105 to specifically _note_ that this is a planned addition for 2.6? Just a sentence or two would do it, e.g.: "Python 2.6 will include a 'from __future__ import print_function', which enables use of print as a function with these semantics instead of the traditional Python 2 print statement.". A note in some other materials about Python 2->3 transition would be nice too. Also... will the 2to3 tool support this? What I mean is, if 2to3 sees "from __future__ import print_function", will it leave print function calls alone? If not, could that be changed? Thanks. --- David A. Wheeler From dwheeler at dwheeler.com Wed Oct 17 18:40:08 2007 From: dwheeler at dwheeler.com (David A. Wheeler) Date: Wed, 17 Oct 2007 12:40:08 -0400 (EDT) Subject: [Python-3000] Please re-add __cmp__ to python 3000 Message-ID: I said: > I agree with Collin Winter: losing __cmp__ is a loss (see http://oakwinter.com/code/). Steven Bethard said: >Why can't this just be supplied with a mixin? Here's a recipe >providing the appropriate mixins if you want to define a __key__ >function: > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403 That _works_ from a functional perspective, and if Python3 fails to include direct support for __cmp__, then I think providing a built-in mixin is necessary. But mixins for comparison are a BIG LOSER for sort performance if your fundamental operator is a cmp-like function. Sorting is completely dominated by comparison time, and the mixin is a big lose for performance. Basically, sorts always involve an inner loop that does comparisons, so any time comparison is slow, you're potentially dooming the whole program to a slow inner loop. A mixin-calling-cmp model doubles the function call work; it has to find the mixin, call it, which eventually has to find and call the final cmp operator. I did a test (see below), and the mixin using a simulated cmp took 50% MORE time to sort a list using Python 2.5 (see code below) than when __cmp__ is used directly (as you CAN do in Python 2.5). A few tests with varying list size lead me to believe that this isn't linear - as the lists get longer the % performance hit increases. In other words, it's a LOT slower, and as the size increases it gets far worse. That kind of nasty performance hit will probably lead people to write lots of code that duplicates comparison functionality in each __lt__, __gt__, etc. When the comparisons THEMSELVES are nontrivial, that will result in lots of duplicate, potentially-buggy code. All of which is avoidable if __cmp__ is directly supported, as it ALREADY is in Python 1 and 2. In addition, even IF the performance wasn't a big deal (and I think it is), I believe __cmp__ is the better basic operator in most cases. As a style issue, I strongly prefer __cmp__ unless I have a specific need for comparisons which are atypical, e.g., where sometimes both __lt__ and __ge__ will return false given the same data (IEEE floats do this if you need exactly-IEEE-specified behavior of NaNs, etc.). By preferring __cmp__ I eliminate lots of duplicate code, and once it's right, it's always right for ALL comparisons. Sometimes __lt__ and friends are absolutely needed, e.g., when __lt__(x,y)==__gt__(x,y) for some values of x,y, but most of the time I find that they're an indicator of bad code and that __cmp__ should have been used instead. Direct support of __cmp__ is a GOOD thing, not a wart or obsolete feature. Adding a standard comparison mixin in a library is probably a good idea as well, but restoring __cmp__ is in my mind more important. I can write my own mixin, but working around a failure to call __cmp__ gives a big performance hit. --- David A. Wheeler ======================================== Here's my quick test code, in two files cmptest and cmptest2. The whitespace may be munged by my mailer or the list, sorry if it is. ==== cmptest2 ==== #!/usr/bin/env python2 # cmp-test2.py import timeit time1 = timeit.Timer("x = sorted(list)", """ import cmptest import random randomlist = [random.randrange(1,100000) for x in range(100000)] list = [cmptest.NumberWithCmp(x) for x in randomlist] """) time2 = timeit.Timer("x = sorted(list)", """ import cmptest import random randomlist = [random.randrange(1,100000) for x in range(100000)] list = [cmptest.NumberMixinCmp(x) for x in randomlist] """) finaltime1 = time1.timeit(3) finaltime2 = time2.timeit(3) print finaltime1 print finaltime2 ====== cmptest ====== #!/usr/bin/env python2 # cmp-test.py import random import timeit class NumberWithCmp(object): "Store 'x' for comparison" def __init__(self, data): self.x = data def __str__(self): return str(self.x) def __cmp__(self, other): if self.x == other.x: return 0 return (-1 if self.x < other.x else 1) # Mixin, similar to http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403 class ComparisonMixin(object): "Implement <, >, etc. by invoking a 'cmp' function." def __lt__(self, other): return self.cmp(other) < 0 def __le__(self, other): return self.cmp(other) <= 0 def __gt__(self, other): return self.cmp(other) > 0 def __ge__(self, other): return self.cmp(other) >= 0 class NumberMixinCmp(ComparisonMixin): def __init__(self, data): self.x = data def __str__(self): return str(self.x) def cmp(self, other): if self.x == other.x: return 0 return (-1 if self.x < other.x else 1) From dwheeler at dwheeler.com Wed Oct 17 18:57:38 2007 From: dwheeler at dwheeler.com (David A. Wheeler) Date: Wed, 17 Oct 2007 12:57:38 -0400 (EDT) Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: Message-ID: I said: > I did a test (see below), and the mixin using a simulated cmp took > 50% MORE time to sort a list using Python 2.5 (see code below) than > when __cmp__ is used directly (as you CAN do in Python 2.5). Oops, I forgot to post the actual numbers. Here they are, on my box (your mileage will CERTAINLY vary): $ ./cmptest2.py 7.34321498871 10.9759318829 $ ./cmptest2.py 7.30745196342 10.9110951424 $ ./cmptest2.py 7.25755906105 10.9108018875 In each run, the first number is the # of seconds to do the sort, using __cmp__; the second is the number of seconds, using a mixin. I ran it 3 times, and took the min of each. Using the min() of each number, we have a mixin performance overhead of (10.91-7.26)/7.26 = 50.3% --- David A. Wheeler From adam at hupp.org Wed Oct 17 19:21:18 2007 From: adam at hupp.org (Adam Hupp) Date: Wed, 17 Oct 2007 13:21:18 -0400 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: Message-ID: <766a29bd0710171021g7e5d67d0ic6f5ae1d944a1765@mail.gmail.com> On 10/17/07, David A. Wheeler wrote: > class NumberMixinCmp(ComparisonMixin): ... > def cmp(self, other): > if self.x == other.x: return 0 > return (-1 if self.x < other.x else 1) In the common case the == test will be false. About ~1/2 of the tests will be be <, and half >. It's better then to do: if self.x < other.x: return -1 elif self.x > other.x: return 1 else: return 0 This almost halves the time difference, as you would expect. -- Adam Hupp | http://hupp.org/adam/ From guido at python.org Wed Oct 17 19:23:39 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Oct 2007 10:23:39 -0700 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: Message-ID: On 10/17/07, David A. Wheeler wrote: > I said: > > I agree with Collin Winter: losing __cmp__ is a loss (see http://oakwinter.com/code/). > > Steven Bethard said: > >Why can't this just be supplied with a mixin? Here's a recipe > >providing the appropriate mixins if you want to define a __key__ > >function: > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403 > > That _works_ from a functional perspective, and if Python3 fails to include direct support for __cmp__, then I think providing a built-in mixin is necessary. > > But mixins for comparison are a BIG LOSER for sort performance if your fundamental operator is a cmp-like function. However, note that Python's sort() and sorted() are guaranteed to only use '<'. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steven.bethard at gmail.com Wed Oct 17 19:27:38 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Wed, 17 Oct 2007 11:27:38 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: Message-ID: On 10/17/07, David A. Wheeler wrote: > I said: > > I agree with Collin Winter: losing __cmp__ is a loss (see http://oakwinter.com/code/). > > Steven Bethard said: > >Why can't this just be supplied with a mixin? Here's a recipe > >providing the appropriate mixins if you want to define a __key__ > >function: > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403 > > That _works_ from a functional perspective, and if Python3 fails to > include direct support for __cmp__, then I think providing a built-in > mixin is necessary. > > But mixins for comparison are a BIG LOSER for sort performance > if your fundamental operator is a cmp-like function. [snip] > I did a test (see below), and the mixin using a simulated cmp took > 50% MORE time to sort a list using Python 2.5 Patient: When I move my arm, it hurts. Doctor: Well don't move your arm then. ;-) I'm having troubles coming up with things where the *basic* operator is really a cmp-like function. Even in your example, the cmp function was defined in terms of "less than". If the basic operator is really "less than", then why define a cmp() function at all? Particularly since, even in Python 2.5, sorting is faster when you define __lt__ instead of __cmp__:: class NumberWithLessThan(object): def __init__(self, data): self.data = data def __lt__(self, other): return self.data < other.data class NumberWithCmp(object): def __init__(self, data): self.data = data def __cmp__(self, other): return cmp(self.data, other.data) $ python -m timeit -s "import script, random" "data = [script.NumberWithLessThan(i) for i in xrange(1000)]; random.shuffle(data); data.sort()" 100 loops, best of 3: 7.93 msec per loop $ python -m timeit -s "import script, random" "data = [script.NumberWithCmp(i) for i in xrange(1000)]; random.shuffle(data); data.sort()" 100 loops, best of 3: 10.5 msec per loop STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From aahz at pythoncraft.com Wed Oct 17 21:25:32 2007 From: aahz at pythoncraft.com (Aahz) Date: Wed, 17 Oct 2007 12:25:32 -0700 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: Message-ID: <20071017192532.GA23548@panix.com> On Wed, Oct 17, 2007, Steven Bethard wrote: > > I'm having troubles coming up with things where the *basic* operator > is really a cmp-like function. Even in your example, the cmp function > was defined in terms of "less than". If the basic operator is really > "less than", then why define a cmp() function at all? >From my perspective, the real use case for cmp() is when you want to do a three-way comparison of a "large" object (for example, a Decimal instance). You can store the result of cmp() and then do a separate three-way branch. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ The best way to get information on Usenet is not to ask a question, but to post the wrong information. From guido at python.org Thu Oct 18 01:00:23 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Oct 2007 16:00:23 -0700 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: Message-ID: On 10/17/07, Steven Bethard wrote: > I'm having troubles coming up with things where the *basic* operator > is really a cmp-like function. Here's one. When implementing the '<' operator on lists or tuples, you really want to call the 'cmp' operator on the individual items, because otherwise (if all you have is == and <) the algorithm becomes something like "compare for equality until you've found the first pair of items that are unequal; then compare those items again using < to decide the final outcome". If you don't believe this, try to implement this operation using only == or < without comparing any two items more than once. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Thu Oct 18 01:36:48 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 18 Oct 2007 12:36:48 +1300 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: Message-ID: <47169C90.2070003@canterbury.ac.nz> David A. Wheeler wrote: > But mixins for comparison are a BIG LOSER for sort performance Why not provide a __richcmp__ method that directly connects with the corresponding type slot? All the comparisons eventually end up there anyway, so it seems like the right place to provide a one-stop comparison method in the 3.0 age. -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 18 01:44:46 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 18 Oct 2007 12:44:46 +1300 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: Message-ID: <47169E6E.7000804@canterbury.ac.nz> Steven Bethard wrote: > I'm having troubles coming up with things where the *basic* operator > is really a cmp-like function. Think of things like comparing a tuple. You need to work your way along and recursively compare the elements. The decision about when to stop always involves ==, whatever comparison you're trying to do. So if e.g. you're doing <, then you have to test each element first for <, and if that's false, test it for ==. If the element is itself a tuple, it's doing this on its elements too, etc., and things get very inefficient. If you have a single cmp operation that you can apply to the elements, you only need to do it once for each element and it gives you all the information you need. -- Greg From guido at python.org Fri Oct 19 23:06:02 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Oct 2007 14:06:02 -0700 Subject: [Python-3000] PEP 3137 plan of attack {stage 2] Message-ID: On 10/7/07, Guido van Rossum wrote: > I'd like to make complete implementation of PEP 3137 the goal for the > 3.0a2 release. It should be doable to do this release by the end of > October. I don't think anything else *needs* to be done to have a > successful a2 release. I'm still hopeful, though realistically we may not quite make it. Here's a status update on the issues I identified in my last message (plus some identified afterwards): > - remove locale support from PyString Done. > - remove compatibility with PyUnicode from PyString > - remove compatibility with PyString from PyUnicode Not done yet. > - add missing methods to PyBytes (for list, see the PEP and compare to > what's already there) Done (Gregory P. Smith) > - remove buffer API from PyUnicode Done. > - make == and != between PyBytes and PyUnicode return False instead of > raising TypeError Done. > - make == and != between PyString and Pyunicode return False instead > of converting A patch by Thomas Lee exists: http://bugs.python.org/issue1263 However it breaks some unit tests. > - make comparisons between PyString and PyBytes work (these are > properly ordered) Already works. > - change lots of places (e.g. encoders) to return PyString instead of PyBytes Not done. > - change indexing and iteration over PyString to return ints, not > 1-char PyStrings A patch by Alexandre Vassalotti exists but breaks some unit tests: http://bugs.python.org/issue1280 > - change PyString's repr() to return "b'...'" > - change PyBytes's repr() to return "buffer(b'...')" > - change parser so that b"..." returns PyString, not PyBytes > - rename bytes -> buffer, str8 -> bytes A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items: http://bugs.python.org/issue1247 However it breaks too many tests to be applied right now. > - change the constructor for PyString to match the one for PyBytes Not done. > - change PyBytes so that its str() is the same as its repr(). > - change PyString so that its str() is the same as its repr(). Not done. > - add an iteration view over PyBytes (optional) Not yet done (Christian Heimes offered). > - kill basestring. Done (Christian Heimes). > - move initialization of sys.std{in,out,err} into C code and do it earlier. A patch by Christian Heimes exists: http://bugs.python.org/issue1267 However it still breaks some unit tests... All, please provide updated information if I missed a contribution! I'm still hoping for more contributions. I will also try to guide the existing patches into completion and acceptance. There are also some issues that mainly crop up in non-English locales. We will try to get to the bottom of those before releasing 3.0a2, but I need help as I'm myself absolutely unable to work with locales (and I don't have access to a Windows box). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Fri Oct 19 23:24:17 2007 From: brett at python.org (Brett Cannon) Date: Fri, 19 Oct 2007 14:24:17 -0700 Subject: [Python-3000] PEP 3137 plan of attack {stage 2] In-Reply-To: References: Message-ID: On 10/19/07, Guido van Rossum wrote: > On 10/7/07, Guido van Rossum wrote: > > I'd like to make complete implementation of PEP 3137 the goal for the > > 3.0a2 release. It should be doable to do this release by the end of > > October. I don't think anything else *needs* to be done to have a > > successful a2 release. > > I'm still hopeful, though realistically we may not quite make it. > Here's a status update on the issues I identified in my last message > (plus some identified afterwards): [SNIP] > > > - make == and != between PyString and Pyunicode return False instead > > of converting > > A patch by Thomas Lee exists: http://bugs.python.org/issue1263 > However it breaks some unit tests. > [SNIP] > A patch by Alexandre Vassalotti exists but breaks some unit tests: > http://bugs.python.org/issue1280 > > > - change PyString's repr() to return "b'...'" > > - change PyBytes's repr() to return "buffer(b'...')" > > - change parser so that b"..." returns PyString, not PyBytes > > - rename bytes -> buffer, str8 -> bytes > > A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items: > http://bugs.python.org/issue1247 > However it breaks too many tests to be applied right now. [SNIP] > > - move initialization of sys.std{in,out,err} into C code and do it earlier. > > A patch by Christian Heimes exists: http://bugs.python.org/issue1267 > However it still breaks some unit tests... With so many patches now floating around, I figure getting some help with patch approval is probably the most useful. Is there a specific patch you would like to see get applied above the others? Or does it not matter and one should just grab any of them and just try to fix a test or two when one has the spare time? -Brett From guido at python.org Fri Oct 19 23:28:43 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Oct 2007 14:28:43 -0700 Subject: [Python-3000] PEP 3137 plan of attack {stage 2] In-Reply-To: References: Message-ID: On 10/19/07, Brett Cannon wrote: > On 10/19/07, Guido van Rossum wrote: > > On 10/7/07, Guido van Rossum wrote: > > > I'd like to make complete implementation of PEP 3137 the goal for the > > > 3.0a2 release. It should be doable to do this release by the end of > > > October. I don't think anything else *needs* to be done to have a > > > successful a2 release. > > > > I'm still hopeful, though realistically we may not quite make it. > > Here's a status update on the issues I identified in my last message > > (plus some identified afterwards): > [SNIP] > > > > > - make == and != between PyString and Pyunicode return False instead > > > of converting > > > > A patch by Thomas Lee exists: http://bugs.python.org/issue1263 > > However it breaks some unit tests. > > > [SNIP] > > A patch by Alexandre Vassalotti exists but breaks some unit tests: > > http://bugs.python.org/issue1280 > > > > > - change PyString's repr() to return "b'...'" > > > - change PyBytes's repr() to return "buffer(b'...')" > > > - change parser so that b"..." returns PyString, not PyBytes > > > - rename bytes -> buffer, str8 -> bytes > > > > A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items: > > http://bugs.python.org/issue1247 > > However it breaks too many tests to be applied right now. > [SNIP] > > > - move initialization of sys.std{in,out,err} into C code and do it earlier. > > > > A patch by Christian Heimes exists: http://bugs.python.org/issue1267 > > However it still breaks some unit tests... > > With so many patches now floating around, I figure getting some help > with patch approval is probably the most useful. Is there a specific > patch you would like to see get applied above the others? Or does it > not matter and one should just grab any of them and just try to fix a > test or two when one has the spare time? Alas, al of them have problems where they break several unit tests in a fairly deep way. I've made several aborted attempts already at assessing how close each one is, but I got distracted each time (this has been an extra busy week at Google). I'm making a commitment now to doing nothing but this the rest of this afternoon. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Fri Oct 19 23:50:58 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 19 Oct 2007 23:50:58 +0200 Subject: [Python-3000] PEP 3137 plan of attack {stage 2] In-Reply-To: References: Message-ID: <471926C2.3010103@cheimes.de> Guido van Rossum wrote: >> - change PyString's repr() to return "b'...'" >> - change PyBytes's repr() to return "buffer(b'...')" >> - change parser so that b"..." returns PyString, not PyBytes >> - rename bytes -> buffer, str8 -> bytes > > A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items: > http://bugs.python.org/issue1247 > However it breaks too many tests to be applied right now. Yes, it's breaking horrible. It doesn't make sense to work on the fixes until "change the constructor for PyString to match the one for PyBytes" is done. PyString needs to accept an optional encoding and error argument. >> - add an iteration view over PyBytes (optional) > > Not yet done (Christian Heimes offered). I only pointed out that it's missing. I didn't say that I would write it because I don't feel qualified and experienced enough for it. > A patch by Christian Heimes exists: http://bugs.python.org/issue1267 > However it still breaks some unit tests... Which unit tests are broken for you? test_cProfile test_doctest test_email test_profile are broken for me in a vanilla build of py3k. My patch doesn't break additional tests for me. By the way I may have figured out how to fix the profile tests. Christian From guido at python.org Fri Oct 19 23:57:06 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Oct 2007 14:57:06 -0700 Subject: [Python-3000] PEP 3137 plan of attack {stage 2] In-Reply-To: <471926C2.3010103@cheimes.de> References: <471926C2.3010103@cheimes.de> Message-ID: On 10/19/07, Christian Heimes wrote: > Guido van Rossum wrote: > >> - change PyString's repr() to return "b'...'" > >> - change PyBytes's repr() to return "buffer(b'...')" > >> - change parser so that b"..." returns PyString, not PyBytes > >> - rename bytes -> buffer, str8 -> bytes > > > > A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items: > > http://bugs.python.org/issue1247 > > However it breaks too many tests to be applied right now. > > Yes, it's breaking horrible. It doesn't make sense to work on the fixes > until "change the constructor for PyString to match the one for PyBytes" > is done. PyString needs to accept an optional encoding and error argument. Of course. I didn't mean to imply there was a problem with the patch, sorry. > >> - add an iteration view over PyBytes (optional) > > > > Not yet done (Christian Heimes offered). > > I only pointed out that it's missing. I didn't say that I would write it > because I don't feel qualified and experienced enough for it. Oops, sorry again. > > A patch by Christian Heimes exists: http://bugs.python.org/issue1267 > > However it still breaks some unit tests... > > Which unit tests are broken for you? test_cProfile test_doctest > test_email test_profile are broken for me in a vanilla build of py3k. My > patch doesn't break additional tests for me. I'll look into it. Maybe I misremember. > By the way I may have figured out how to fix the profile tests. Cooll submit to the tracker and assign to me any time. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Sat Oct 20 00:26:27 2007 From: lists at cheimes.de (Christian Heimes) Date: Sat, 20 Oct 2007 00:26:27 +0200 Subject: [Python-3000] PEP 3137 plan of attack {stage 2] In-Reply-To: References: <471926C2.3010103@cheimes.de> Message-ID: <47192F13.1040305@cheimes.de> Guido van Rossum wrote: >> I only pointed out that it's missing. I didn't say that I would write it >> because I don't feel qualified and experienced enough for it. > > Oops, sorry again. I may take it as a challenge to write the view but I don't know if I can handle it. I'm still learning how to program C for Python. It may be a good opportunity to learn more. If you don't mind that it may take longer and if somebody could lend me a hand ... :] >> Which unit tests are broken for you? test_cProfile test_doctest >> test_email test_profile are broken for me in a vanilla build of py3k. My >> patch doesn't break additional tests for me. > > I'll look into it. Maybe I misremember. I don't see additional failing unit tests. My patch had an issue but I fixed it couple of days ago. Maybe you can't remember the fix. >> By the way I may have figured out how to fix the profile tests. > > Cooll submit to the tracker and assign to me any time. I can't assign bugs with my current user level but I added you to the nosy list. http://bugs.python.org/issue1302 test_mail fails because the file Lib/email/test/data/msg_15.txt contains an invalid UTF-8 character in "Da dit postl?sningsprogram". The text looks like something should fail: def test_same_boundary_inner_outer(self): unless = self.failUnless msg = self._msgobj('msg_15.txt') # XXX We can probably eventually do better inner = msg.get_payload(0) unless(hasattr(inner, 'defects')) self.assertEqual(len(inner.defects), 1) unless(isinstance(inner.defects[0], errors.StartBoundaryNotFoundDefect)) Christian From brett at python.org Sat Oct 20 00:30:09 2007 From: brett at python.org (Brett Cannon) Date: Fri, 19 Oct 2007 15:30:09 -0700 Subject: [Python-3000] PEP 3137 plan of attack {stage 2] In-Reply-To: <47192F13.1040305@cheimes.de> References: <471926C2.3010103@cheimes.de> <47192F13.1040305@cheimes.de> Message-ID: On 10/19/07, Christian Heimes wrote: > Guido van Rossum wrote: > >> I only pointed out that it's missing. I didn't say that I would write it > >> because I don't feel qualified and experienced enough for it. > > > > Oops, sorry again. > > I may take it as a challenge to write the view but I don't know if I can > handle it. I'm still learning how to program C for Python. It may be a > good opportunity to learn more. If you don't mind that it may take > longer and if somebody could lend me a hand ... :] > > >> Which unit tests are broken for you? test_cProfile test_doctest > >> test_email test_profile are broken for me in a vanilla build of py3k. My > >> patch doesn't break additional tests for me. > > > > I'll look into it. Maybe I misremember. > > I don't see additional failing unit tests. My patch had an issue but I > fixed it couple of days ago. Maybe you can't remember the fix. > > >> By the way I may have figured out how to fix the profile tests. > > > > Cooll submit to the tracker and assign to me any time. > > I can't assign bugs with my current user level but I added you to the > nosy list. > > http://bugs.python.org/issue1302 I went ahead and did the assignment, but there is no patch. =) -Brett From g.brandl at gmx.net Sat Oct 20 00:36:06 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 20 Oct 2007 00:36:06 +0200 Subject: [Python-3000] PEP 3137 plan of attack {stage 2] In-Reply-To: <471926C2.3010103@cheimes.de> References: <471926C2.3010103@cheimes.de> Message-ID: Christian Heimes schrieb: > Guido van Rossum wrote: >>> - change PyString's repr() to return "b'...'" >>> - change PyBytes's repr() to return "buffer(b'...')" >>> - change parser so that b"..." returns PyString, not PyBytes >>> - rename bytes -> buffer, str8 -> bytes >> >> A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items: >> http://bugs.python.org/issue1247 >> However it breaks too many tests to be applied right now. > > Yes, it's breaking horrible. It doesn't make sense to work on the fixes > until "change the constructor for PyString to match the one for PyBytes" > is done. PyString needs to accept an optional encoding and error argument. I do that, currently, patch should be up in a minute. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From nnorwitz at gmail.com Sat Oct 20 01:41:06 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Fri, 19 Oct 2007 16:41:06 -0700 Subject: [Python-3000] PEP 3137 plan of attack {stage 2] In-Reply-To: <47192F13.1040305@cheimes.de> References: <471926C2.3010103@cheimes.de> <47192F13.1040305@cheimes.de> Message-ID: On 10/19/07, Christian Heimes wrote: > > I may take it as a challenge to write the view but I don't know if I can > handle it. I'm still learning how to program C for Python. It may be a > good opportunity to learn more. If you don't mind that it may take > longer and if somebody could lend me a hand ... :] I think questions related to how to make these sorts of changes are on-topic for this list. If you get stuck, feel free to ask here. If you prefer, you (or others) can mail me privately. I tend to answer the questions at my night time (US Pacific), so I may not always be fast with answering. n From tom at vector-seven.com Mon Oct 15 15:10:22 2007 From: tom at vector-seven.com (Thomas Lee) Date: Mon, 15 Oct 2007 23:10:22 +1000 Subject: [Python-3000] PEP3137: str8() and str() comparison Message-ID: <471366BE.50303@vector-seven.com> I just uploaded a patch with all my progress on str8/str comparisons here: http://bugs.python.org/issue1263 I would really like some help from anybody knowledgeable with the following tests: test_compile test_str test_struct test_sqlite As discussed in the issue tracker, these are all failing for various reasons: in all cases I'm not exactly sure how to progress. The following are also failing for me, although this would appear to be unrelated to my patch: test_doctest test_email test_nis test_pty Are these failing for anybody else? Cheers, Tom From lists at cheimes.de Mon Oct 22 04:13:52 2007 From: lists at cheimes.de (Christian Heimes) Date: Mon, 22 Oct 2007 04:13:52 +0200 Subject: [Python-3000] Failing unit tests on WIndows Message-ID: Python 3000 needs some love from Windows developers. The test were run on Windows XP SP2, X86, VS 2003, SDK 2003R2, rev58587 with a fixed pythoncore project file. My build environment has no devenv.exe so bsddb is missing. 252 tests OK. 20 tests failed: test_csv test_dumbdbm test_file test_fileinput test_gettext test_io test_mailbox test_netrc test_pep277 test_shutil test_sqlite test_strptime test_subprocess test_tarfile test_tempfile test_threaded_import test_threadedtempfile test_time test_urllib test_zipfile 48 tests skipped: test__locale test_aepack test_applesingle test_bsddb test_bsddb3 test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_commands test_crypt test_curses test_dbm test_dl test_fcntl test_fork1 test_gdbm test_grp test_ioctl test_largefile test_macostools test_mhlib test_nis test_normalization test_openpty test_ossaudiodev test_pipes test_plistlib test_poll test_posix test_pty test_pwd test_resource test_scriptpackages test_signal test_socket_ssl test_socketserver test_ssl test_syslog test_threadsignals test_timeout test_urllib2net test_urllibnet test_wait3 test_wait4 test_xmlrpc_net test_zipfile64 3 skips unexpected on win32: test_ssl test_syslog test_bsddb Christian From lists at cheimes.de Mon Oct 22 04:27:26 2007 From: lists at cheimes.de (Christian Heimes) Date: Mon, 22 Oct 2007 04:27:26 +0200 Subject: [Python-3000] Failing unit tests on WIndows In-Reply-To: References: Message-ID: Fix for tempfile bug on Windows: http://bugs.python.org/issue1310 Fix for project file: http://bugs.python.org/issue1309 By the way to build repository contains an old version of OpenSSL 0.9.8a while OpenSSL 0.9.8g is out. 0.9.8a is more than 2 years old and doesn't build cleanly with VS 2005. Could somebody please update it to http://openssl.org/source/openssl-0.9.8g.tar.gz ? Christian From guido at python.org Mon Oct 22 05:33:57 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 21 Oct 2007 20:33:57 -0700 Subject: [Python-3000] Failing unit tests on WIndows In-Reply-To: References: Message-ID: Thanks for taking the time to do this, Chris! I'm sure the fixes you posted separately will be checked in soon. Hopefully others will jump in with fixes for more of the issues below. --Guido 2007/10/21, Christian Heimes : > Python 3000 needs some love from Windows developers. The test were run > on Windows XP SP2, X86, VS 2003, SDK 2003R2, rev58587 with a fixed > pythoncore project file. My build environment has no devenv.exe so bsddb > is missing. > > 252 tests OK. > 20 tests failed: > test_csv test_dumbdbm test_file test_fileinput test_gettext > test_io test_mailbox test_netrc test_pep277 test_shutil > test_sqlite test_strptime test_subprocess test_tarfile > test_tempfile test_threaded_import test_threadedtempfile test_time > test_urllib test_zipfile > 48 tests skipped: > test__locale test_aepack test_applesingle test_bsddb test_bsddb3 > test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp > test_codecmaps_kr test_codecmaps_tw test_commands test_crypt > test_curses test_dbm test_dl test_fcntl test_fork1 test_gdbm > test_grp test_ioctl test_largefile test_macostools test_mhlib > test_nis test_normalization test_openpty test_ossaudiodev > test_pipes test_plistlib test_poll test_posix test_pty test_pwd > test_resource test_scriptpackages test_signal test_socket_ssl > test_socketserver test_ssl test_syslog test_threadsignals > test_timeout test_urllib2net test_urllibnet test_wait3 test_wait4 > test_xmlrpc_net test_zipfile64 > 3 skips unexpected on win32: > test_ssl test_syslog test_bsddb > > Christian > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Mon Oct 22 22:27:44 2007 From: brett at python.org (Brett Cannon) Date: Mon, 22 Oct 2007 13:27:44 -0700 Subject: [Python-3000] PEP 3137 plan of attack {stage 2] In-Reply-To: References: Message-ID: On 10/19/07, Guido van Rossum wrote: [SNIP] > > - make == and != between PyString and Pyunicode return False instead > > of converting > > A patch by Thomas Lee exists: http://bugs.python.org/issue1263 > However it breaks some unit tests. This is now done. -Brett From lists at cheimes.de Tue Oct 23 03:15:25 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 23 Oct 2007 03:15:25 +0200 Subject: [Python-3000] PEP 3137 plan of attack {stage 2] In-Reply-To: References: <471926C2.3010103@cheimes.de> Message-ID: <471D4B2D.1090905@cheimes.de> Georg Brandl wrote: > I do that, currently, patch should be up in a minute. How is your patch? It's not in the svn repository yet. Christian From g.brandl at gmx.net Tue Oct 23 07:59:51 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 23 Oct 2007 07:59:51 +0200 Subject: [Python-3000] PEP 3137 plan of attack {stage 2] In-Reply-To: <471D4B2D.1090905@cheimes.de> References: <471926C2.3010103@cheimes.de> <471D4B2D.1090905@cheimes.de> Message-ID: Christian Heimes schrieb: > Georg Brandl wrote: >> I do that, currently, patch should be up in a minute. > > How is your patch? It's not in the svn repository yet. It's in issue 1303. Georg From gnewsg at gmail.com Mon Oct 22 15:07:50 2007 From: gnewsg at gmail.com (Giampaolo Rodola') Date: Mon, 22 Oct 2007 06:07:50 -0700 Subject: [Python-3000] Failing unit tests on WIndows In-Reply-To: References: Message-ID: <1193058470.684200.189820@q3g2000prf.googlegroups.com> On 22 Ott, 04:13, Christian Heimes wrote: > Python 3000 needs some love from Windows developers. The test were run > on Windows XP SP2, X86, VS 2003, SDK 2003R2, rev58587 with a fixed > pythoncore project file. My build environment has no devenv.exe so bsddb > is missing. > > 252 tests OK. > 20 tests failed: > test_csv test_dumbdbm test_file test_fileinput test_gettext > test_io test_mailbox test_netrc test_pep277 test_shutil > test_sqlite test_strptime test_subprocess test_tarfile > test_tempfile test_threaded_import test_threadedtempfile test_time > test_urllib test_zipfile > 48 tests skipped: > test__locale test_aepack test_applesingle test_bsddb test_bsddb3 > test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp > test_codecmaps_kr test_codecmaps_tw test_commands test_crypt > test_curses test_dbm test_dl test_fcntl test_fork1 test_gdbm > test_grp test_ioctl test_largefile test_macostools test_mhlib > test_nis test_normalization test_openpty test_ossaudiodev > test_pipes test_plistlib test_poll test_posix test_pty test_pwd > test_resource test_scriptpackages test_signal test_socket_ssl > test_socketserver test_ssl test_syslog test_threadsignals > test_timeout test_urllib2net test_urllibnet test_wait3 test_wait4 > test_xmlrpc_net test_zipfile64 > 3 skips unexpected on win32: > test_ssl test_syslog test_bsddb > > Christian > > _______________________________________________ > Python-3000 mailing list > Python-3... at python.orghttp://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe:http://mail.python.org/mailman/options/python-3000/python-3000-garchi... Most error seems to be attributable to Unicode-related problems: UnicodeDecodeError: 'utf8' codec can't decode bytes in position xx-yy: invalid data The following tests DO NOT fail on my Windows XP prof sp2 box: test_sqlite, test_strptime, test_tarfile, test_threaded_import, test_threadedtempfile, test_time, test_urllib, test_zipfile. From lists at cheimes.de Tue Oct 23 10:08:12 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 23 Oct 2007 10:08:12 +0200 Subject: [Python-3000] Failing unit tests on WIndows In-Reply-To: <1193058470.684200.189820@q3g2000prf.googlegroups.com> References: <1193058470.684200.189820@q3g2000prf.googlegroups.com> Message-ID: Giampaolo Rodola' wrote: > Most error seems to be attributable to Unicode-related problems: > > UnicodeDecodeError: 'utf8' codec can't decode bytes in position xx-yy: > invalid data > > The following tests DO NOT fail on my Windows XP prof sp2 box: > test_sqlite, test_strptime, test_tarfile, test_threaded_import, > test_threadedtempfile, test_time, test_urllib, test_zipfile. A bunch of tests are already fixed (r58590 and r58593). Some of the failing tests depend on the locale and time zone. They don't break when I "set TZ=GMT" on the console before I run the test suite. 257 tests OK. 15 tests failed: test_codeccallbacks test_csv test_ctypes test_dumbdbm test_file test_fileinput test_gettext test_io test_mailbox test_netrc test_pep277 test_strptime test_subprocess test_tempfile test_time 48 tests skipped: test__locale test_aepack test_applesingle test_bsddb test_bsddb3 test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_commands test_crypt test_curses test_dbm test_dl test_fcntl test_fork1 test_gdbm test_grp test_ioctl test_largefile test_macostools test_mhlib test_nis test_normalization test_openpty test_ossaudiodev test_pipes test_plistlib test_poll test_posix test_pty test_pwd test_resource test_scriptpackages test_signal test_socket_ssl test_socketserver test_ssl test_syslog test_threadsignals test_timeout test_urllib2net test_urllibnet test_wait3 test_wait4 test_xmlrpc_net test_zipfile64 3 skips unexpected on win32: test_ssl test_syslog test_bsddb With set TZ=GMT test_time and test_strptime pass. Christian From g.brandl at gmx.net Tue Oct 23 20:47:45 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 23 Oct 2007 20:47:45 +0200 Subject: [Python-3000] PyInt_AS_LONG error checking Message-ID: PyInt_AS_LONG is #defined as PyLong_AsLong since the int/long unification. However, most places that use this macro (and also places that use PyInt_AsLong) assume it cannot fail which means that an exception won't be properly propagated in that case. If I don't overlook something here, all these places have to be fixed... Georg From martin at v.loewis.de Tue Oct 23 21:03:54 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Oct 2007 21:03:54 +0200 Subject: [Python-3000] PyInt_AS_LONG error checking In-Reply-To: References: Message-ID: <471E459A.6060605@v.loewis.de> > PyInt_AS_LONG is #defined as PyLong_AsLong since the int/long unification. > > However, most places that use this macro (and also places that > use PyInt_AsLong) assume it cannot fail which means that an exception > won't be properly propagated in that case. > > If I don't overlook something here, all these places have to be fixed... I think you do overlook something. Many of these places do PyInt_CheckExact before invoking the macro. PyInt_CheckExact includes _PyLong_FitsInLong, so if that test returns true, then PyInt_AS_LONG cannot fail. So the only places that need to be fixed are those where PyInt_AS_LONG isn't protected by PyInt_CheckExact. HTH, Martin From g.brandl at gmx.net Tue Oct 23 21:19:45 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 23 Oct 2007 21:19:45 +0200 Subject: [Python-3000] PyInt_AS_LONG error checking In-Reply-To: <471E459A.6060605@v.loewis.de> References: <471E459A.6060605@v.loewis.de> Message-ID: Martin v. L?wis schrieb: >> PyInt_AS_LONG is #defined as PyLong_AsLong since the int/long unification. >> >> However, most places that use this macro (and also places that >> use PyInt_AsLong) assume it cannot fail which means that an exception >> won't be properly propagated in that case. >> >> If I don't overlook something here, all these places have to be fixed... > > I think you do overlook something. Many of these places do > PyInt_CheckExact before invoking the macro. PyInt_CheckExact includes > _PyLong_FitsInLong, so if that test returns true, then PyInt_AS_LONG > cannot fail. > > So the only places that need to be fixed are those where PyInt_AS_LONG > isn't protected by PyInt_CheckExact. Ok, thanks, that explains it. Georg BTW, _PyLong_FitsInLong says "/* conservative estimate */" -- it doesn't really allow the whole range of C long... From guido at python.org Tue Oct 23 21:30:07 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Oct 2007 12:30:07 -0700 Subject: [Python-3000] Three new failing tests? Message-ID: I've got three tests failing in my py3k branch, on Linux: $ ./python Lib/test/regrtest.py test_codeccallbacks test_ctypes test_locale test_codeccallbacks test test_codeccallbacks failed -- Traceback (most recent call last): File "/usr/local/google/home/guido/python/py3kd/Lib/test/test_codeccallbacks.py", line 795, in test_translatehelper self.assertRaises(ValueError, "\xff".translate, D()) AssertionError: ValueError not raised by translate test_ctypes test test_ctypes failed -- errors occurred; run in verbose mode for details test_locale test test_locale produced unexpected output: ********************************************************************** *** lines 2-5 of actual output doesn't appear in expected output after line 1: + s'\xc3\xac\xc2\xa0\xc2\xbc'.split() == [s'\xc3\xac\xc2\xa0\xc2\xbc'] != ['\xec\xa0\xbc'] + s'\xc3\xad\xc2\x95\xc2\xa0'.strip() == s'\xc3\xad\xc2\x95\xc2\xa0' != '\xed\x95\xa0' + s'\xc3\x8c\xc2\x85'.lower() == s'\xc3\x8c\xc2\x85' != '\xcc\x85' + s'\xc3\xad\xc2\x95\xc2\xa0'.upper() == s'\xc3\xad\xc2\x95\xc2\xa0' != '\xed\x95\xa0' ********************************************************************** 3 tests failed: test_codeccallbacks test_ctypes test_locale [94873 refs] $ I don't think Georg's latest checkin (PyInt_Check/PyLong_Check issues) broke these. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Oct 23 21:36:16 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Oct 2007 12:36:16 -0700 Subject: [Python-3000] Question about email/generator.py In-Reply-To: References: Message-ID: There's an issue in the email package that I can't resolve by myself. I described it to Barry like this: > > So in generator.py on line 291, I read: > > > > print(part.get_payload(decode=True), file=self) > > > > It turns out that part.get_payload(decode=True) returns a bytes > > object, and printing a bytes object to a text file is not the right > > thing to do -- in 3.0a1 it silently just prints those bytes, in 3.0a2 > > it will probably print the repr() of the bytes object. Right now, it > > errors out because I'm removing the encode() method on PyString > > objects, and print() converts PyBytes to PyString; then the > > TextIOWrapper.write() method tries to encode its argument. > > > > If I change this to (decode=False), all tests in the email package > > pass. But is this the right fix??? I should note that this was checked in by the time Barry replied, even though it clearly was the wrong thing to do. Barry replied: > Maybe. ;) The problem is that this API is either being too smart for > its own good, or not smart enough. The intent of decode=True is to > return the original object encoded in the payload. So for example, > if MIMEImage was used to encode some jpeg, then decode=True should > return that jpeg. > > The problem is that what you really want is something that's content- > type aware, such that if your main type is some non-text type like > image/* or audio/* or even application/octet-stream, you will almost > always want a bytes object back. But text can also be encoded via > charset and/or transfer-encoding, and (at least in Py2.x), you'd use > the same method to get the original, unencoded text back. In that > case, you definitely want the string, since that's the most natural > API (i.e. you fed it a string object when you created the MIMEText, > so you want a string on the way back out). > > This is yet another corner case where the old API doesn't really fit > the new bytes/string model correctly, and of course you can > (rightly!) argue we were sloppy in Py2.x but were able to (mostly) > get away with it. > > In this /specific/ situation, generator.py:291 can only be called > when the main type is text, so I think it is clearly expecting a > string, even though .get_payload() will return a bytes there. > > Short of redesigning the API, I can think of two options. First, we > can change .get_payload() to specific return a string when the main > type is text and decode=True. This is ugly because the return type > will depend on the content type of the message. OTOH, get_payload() > is already fairly ugly here because its return type differs based on > its argument, although I'd like to split this into a > separate .get_decoded_payload() method. > > The other option is to let .get_payload() return bytes in all cases, > but in generator.py:291, explicitly convert it to a string, probably > using raw-unicode-escape. Because we know the main type is text > here, we know that the payload must contain a string. get_payload() > will return the bytes of the decoded unicode string, so raw-unicode- > escape should do the right thing. That's ugly too for obvious reasons. > > The one thing that doesn't seem right is for decode=False to be used > because should the payload be an encoded string, it won't get > correctly decoded. This is part of the DecodedGenerator, which > honestly is probably not much used outside the test cases. but the > intent of that generator is clearly to print the decoded text parts > with the non-text parts stripped and replaced by a placeholder. So I > think it definitely wants decoded text payloads, otherwise there's > not much point in the class. > > I hope that explains the situation. I'm open to any other idea -- it > doesn't even have to be better. ;) I see that you made the > decode=False change in svn, but that's the one solution that doesn't > seem right. At this point I (Guido) am really hoping someone will want to "own" this issue and redesign the API properly... -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Tue Oct 23 21:50:05 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 23 Oct 2007 15:50:05 -0400 Subject: [Python-3000] Question about email/generator.py In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 23, 2007, at 3:36 PM, Guido van Rossum wrote: > There's an issue in the email package that I can't resolve by myself. > I described it to Barry like this: > >>> So in generator.py on line 291, I read: >>> >>> print(part.get_payload(decode=True), file=self) >>> >>> It turns out that part.get_payload(decode=True) returns a bytes >>> object, and printing a bytes object to a text file is not the right >>> thing to do -- in 3.0a1 it silently just prints those bytes, in >>> 3.0a2 >>> it will probably print the repr() of the bytes object. Right now, it >>> errors out because I'm removing the encode() method on PyString >>> objects, and print() converts PyBytes to PyString; then the >>> TextIOWrapper.write() method tries to encode its argument. >>> >>> If I change this to (decode=False), all tests in the email package >>> pass. But is this the right fix??? > > I should note that this was checked in by the time Barry replied, even > though it clearly was the wrong thing to do. Barry replied: > >> Maybe. ;) The problem is that this API is either being too smart for >> its own good, or not smart enough. The intent of decode=True is to >> return the original object encoded in the payload. So for example, >> if MIMEImage was used to encode some jpeg, then decode=True should >> return that jpeg. >> >> The problem is that what you really want is something that's content- >> type aware, such that if your main type is some non-text type like >> image/* or audio/* or even application/octet-stream, you will almost >> always want a bytes object back. But text can also be encoded via >> charset and/or transfer-encoding, and (at least in Py2.x), you'd use >> the same method to get the original, unencoded text back. In that >> case, you definitely want the string, since that's the most natural >> API (i.e. you fed it a string object when you created the MIMEText, >> so you want a string on the way back out). >> >> This is yet another corner case where the old API doesn't really fit >> the new bytes/string model correctly, and of course you can >> (rightly!) argue we were sloppy in Py2.x but were able to (mostly) >> get away with it. >> >> In this /specific/ situation, generator.py:291 can only be called >> when the main type is text, so I think it is clearly expecting a >> string, even though .get_payload() will return a bytes there. >> >> Short of redesigning the API, I can think of two options. First, we >> can change .get_payload() to specific return a string when the main >> type is text and decode=True. This is ugly because the return type >> will depend on the content type of the message. OTOH, get_payload() >> is already fairly ugly here because its return type differs based on >> its argument, although I'd like to split this into a >> separate .get_decoded_payload() method. >> >> The other option is to let .get_payload() return bytes in all cases, >> but in generator.py:291, explicitly convert it to a string, probably >> using raw-unicode-escape. Because we know the main type is text >> here, we know that the payload must contain a string. get_payload() >> will return the bytes of the decoded unicode string, so raw-unicode- >> escape should do the right thing. That's ugly too for obvious >> reasons. >> >> The one thing that doesn't seem right is for decode=False to be used >> because should the payload be an encoded string, it won't get >> correctly decoded. This is part of the DecodedGenerator, which >> honestly is probably not much used outside the test cases. but the >> intent of that generator is clearly to print the decoded text parts >> with the non-text parts stripped and replaced by a placeholder. So I >> think it definitely wants decoded text payloads, otherwise there's >> not much point in the class. >> >> I hope that explains the situation. I'm open to any other idea -- it >> doesn't even have to be better. ;) I see that you made the >> decode=False change in svn, but that's the one solution that doesn't >> seem right. > > At this point I (Guido) am really hoping someone will want to "own" > this issue and redesign the API properly... I'm really bummed that I've had no time to work on this. Life and work have imposed. I'd be willing to chat with someone about what I think should happen. At this point irc or im might be best. :( - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRx5Qb3EjvBPtnXfVAQIcbwP9FPa/IJpIg+D2y/FJJp0LRqXctGhXUssi aDX8M07pHu9aMPXKvDYZw50NFcyx87mMjWNVf2gX1KjM+U5XUns3WwtU+C60ZBSn gEUmzAaYJVhDWguRiOpCX/bR1F2U8dudDR0UC8wrV9Mylk/C4b/q7bUdrGeT8riK +oSTcaKTatY= =98W1 -----END PGP SIGNATURE----- From lists at cheimes.de Tue Oct 23 21:56:57 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 23 Oct 2007 21:56:57 +0200 Subject: [Python-3000] Three new failing tests? In-Reply-To: References: Message-ID: Guido van Rossum wrote: > I don't think Georg's latest checkin (PyInt_Check/PyLong_Check issues) > broke these. You are right. They were already broken at 10am CEST. From luke.stebbing at gmail.com Tue Oct 23 23:31:01 2007 From: luke.stebbing at gmail.com (Luke Stebbing) Date: Tue, 23 Oct 2007 14:31:01 -0700 Subject: [Python-3000] Question about email/generator.py In-Reply-To: References: Message-ID: On 10/23/07, Barry Warsaw wrote: > On Oct 23, 2007, at 3:36 PM, Guido van Rossum wrote: > > At this point I (Guido) am really hoping someone will want to "own" > > this issue and redesign the API properly... > > I'm really bummed that I've had no time to work on this. Life and > work have imposed. I'd be willing to chat with someone about what I > think should happen. At this point irc or im might be best. :( In py2k, you determine whether a payload is 'list of Message' or 'str' by calling .is_multipart(). Maybe .is_str() and .is_bytes() methods (or properties) could be added. Alternatively, there could be a .payload_type property to test against. Whatever it does, I think it should parallel the polymorphic structure used by the new I/O [1]. Does that mean Message ABCs? Would that be overkill? I've been using the email package pretty heavily this year, and I'd be up for talking about this on any of the im services or on freenode or whatever. Luke [1] http://www.python.org/dev/peps/pep-3116 From guido at python.org Fri Oct 26 01:33:13 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 25 Oct 2007 16:33:13 -0700 Subject: [Python-3000] Need help with Windows failures Message-ID: Hi Christian and Amaury (and anyone else with a Windows setup who would like to help!), I noticed that both of you are contributing fixes for Windows-specific issues. Could I get your help with some other Windows issues? See e.g. the failures on http://www.python.org/dev/buildbot/3.0/x86%20XP-3%203.0/builds/182/step-test/0 I'd be happy to give you some pointers on specific failures if you can't figure out what might cause them. (To find the errors, scroll to the end and then scroll up; or search for "Re-running failed tests in verbose mode".) Please CC Neal Norwitz as well, he may have some suggestions as well. Some random notes: - It looks like there are some CRLF issues. Quite a few things complain about mysterious syntax errors; I see some problems where \n characters seem to appear or disappear. - Most of the mailbox test failures seem due to a failed cleanup in the second failing test (note how it prints FAIL and then ERROR -- that suggests the ERROR happened in the tearDown()). - In general, whenever you see other failures mentioning things like "The process cannot access the file because it is being used by another process: '@test'" it's probably a failed test not properly cleaning up; a lot of tests either don't always close files (could use try/finally: f.close()) or don't remove them. (The best way to remove files btw is typically test_support.remove().) Thanks in advance to anyone who fixes a Windows bug in Py3k! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Fri Oct 26 05:31:28 2007 From: barry at python.org (Barry Warsaw) Date: Thu, 25 Oct 2007 23:31:28 -0400 Subject: [Python-3000] Question about email/generator.py In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 23, 2007, at 5:31 PM, Luke Stebbing wrote: > On 10/23/07, Barry Warsaw wrote: >> On Oct 23, 2007, at 3:36 PM, Guido van Rossum wrote: >>> At this point I (Guido) am really hoping someone will want to "own" >>> this issue and redesign the API properly... >> >> I'm really bummed that I've had no time to work on this. Life and >> work have imposed. I'd be willing to chat with someone about what I >> think should happen. At this point irc or im might be best. :( > > In py2k, you determine whether a payload is 'list of Message' or 'str' > by calling .is_multipart(). Maybe .is_str() and .is_bytes() methods > (or properties) could be added. Alternatively, there could be a > .payload_type property to test against. > > Whatever it does, I think it should parallel the polymorphic structure > used by the new I/O [1]. Does that mean Message ABCs? Would that be > overkill? > > I've been using the email package pretty heavily this year, and I'd be > up for talking about this on any of the im services or on freenode or > whatever. Hi Luke, I'm actually thinking something along the lines of changing .get_payload() to only return the raw payload when the content type is scalar. For non-scalar types (i.e. multiparts), you'd get an exception if you tried to use .get_payload(). I'd also separate out getting the raw payload and getting the decoded payload into separate APIs, either by adding a new .get_decoded_payload() or having .get_payload() return a Payload object that knows how to decode itself (and return its content type). Can you talk more about how you think the polymorphism would work? I don't immediately see a parallel, and yeah, I kind of do think that message ABCs are overkill (I'd love for whatever we come up with to be backward compatible with Python 2.x if at all possible). The fact that .get_decoded_payload() could return bytes or strings is bothersome though, so if you have some thoughts about how to do that more cleanly, I'm all ears. Definitely ping me on freenode (nick: barry) any time during working hours EST if you want to chat. I'm almost always on, hanging out in #mailman and #launchpad, though that shouldn't matter if you want to pvtmsg me. Cheers, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRyFfkHEjvBPtnXfVAQJA0gP+PsEBOhInn5ReEwGgD9BeDg12VFkVrDET UTZlbPpBhDISNvByfvHJXSJMnO1XCmUniA4a7sQ0PHEjdEMHSFY6NKT3BtVRg4yh WoDIEVs8WIOn2k+tHb2E0SDPQTNtnyA2FbG8CGq27wvGxbd3C61ytylgKofP+0A8 oJHW6atRW7g= =fNGw -----END PGP SIGNATURE----- From christian at cheimes.de Fri Oct 26 01:43:12 2007 From: christian at cheimes.de (Christian Heimes) Date: Fri, 26 Oct 2007 01:43:12 +0200 Subject: [Python-3000] Need help with Windows failures In-Reply-To: References: Message-ID: <47212A10.50506@cheimes.de> Guido van Rossum wrote: > Hi Christian and Amaury (and anyone else with a Windows setup who > would like to help!), > > I noticed that both of you are contributing fixes for Windows-specific > issues. Could I get your help with some other Windows issues? Yes, I've set up a VMWare Win XP instance on my Linux box for Python 3.0 and PythonDotNet. > - Most of the mailbox test failures seem due to a failed cleanup in > the second failing test (note how it prints FAIL and then ERROR -- > that suggests the ERROR happened in the tearDown()). > > - In general, whenever you see other failures mentioning things like > "The process cannot access the file because it is being used by > another process: '@test'" it's probably a failed test not properly > cleaning up; a lot of tests either don't always close files (could use > try/finally: f.close()) or don't remove them. (The best way to remove > files btw is typically test_support.remove().) I'm going to look into the mailbox and @test problems. Christian From shiblon at gmail.com Fri Oct 26 15:48:28 2007 From: shiblon at gmail.com (Chris Monson) Date: Fri, 26 Oct 2007 09:48:28 -0400 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: <200710260855.52649.mark@qtrac.eu> References: <200710260855.52649.mark@qtrac.eu> Message-ID: Forwarding to the group for discussion. On 10/26/07, Mark Summerfield wrote: There is one thing about this PEP I don't like: The available integer presentation types are: 'd' - Decimal Integer. Outputs the number in base 10. I think this is confusing (since this will not print a decimal.Decimal object), and is a throw back to early versions of C. Modern C now has 'i' as an alternative to 'd' and I wish this PEP would use 'i' for integer rather than the contrived 'd' for "decimal" integer (which sounds like a contradition because most people expect decimals to have fractional parts). I guess if 'd' is too late to change then one "solution" would be: 'd' - Denary Integer. Outputs the number in base 10. because at least that fits with octal and hex. -- Mark Summerfield, Qtrac Ltd., www.qtrac.eu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071026/1ad1ab4e/attachment.htm From phd at phd.pp.ru Fri Oct 26 16:20:36 2007 From: phd at phd.pp.ru (Oleg Broytmann) Date: Fri, 26 Oct 2007 18:20:36 +0400 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: References: <200710260855.52649.mark@qtrac.eu> Message-ID: <20071026142036.GB3365@phd.pp.ru> On Fri, Oct 26, 2007 at 09:48:28AM -0400, Chris Monson wrote: > 'd' - Decimal Integer. Outputs the number in base 10. [skip] > 'd' - Denary Integer. Outputs the number in base 10. -1. I know what "decimal integers" are, but never heard about "denary" (my spellchecker complains, too). Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From guido at python.org Fri Oct 26 16:24:43 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Oct 2007 07:24:43 -0700 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: <20071026142036.GB3365@phd.pp.ru> References: <200710260855.52649.mark@qtrac.eu> <20071026142036.GB3365@phd.pp.ru> Message-ID: 2007/10/26, Oleg Broytmann : > On Fri, Oct 26, 2007 at 09:48:28AM -0400, Chris Monson wrote: [quoting Mark Summerfield] > > 'd' - Decimal Integer. Outputs the number in base 10. > [skip] > > 'd' - Denary Integer. Outputs the number in base 10. > > -1. I know what "decimal integers" are, but never heard about "denary" > (my spellchecker complains, too). -1 indeed. What's wrong with binary, octal, decimal, hexademimal? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mark at qtrac.eu Fri Oct 26 16:40:55 2007 From: mark at qtrac.eu (Mark Summerfield) Date: Fri, 26 Oct 2007 15:40:55 +0100 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: References: <20071026142036.GB3365@phd.pp.ru> Message-ID: <200710261540.55302.mark@qtrac.eu> On 2007-10-26, Guido van Rossum wrote: > 2007/10/26, Oleg Broytmann : > > On Fri, Oct 26, 2007 at 09:48:28AM -0400, Chris Monson wrote: > > [quoting Mark Summerfield] > > > > 'd' - Decimal Integer. Outputs the number in base 10. > > > > [skip] > > > > > 'd' - Denary Integer. Outputs the number in base 10. > > > > -1. I know what "decimal integers" are, but never heard about "denary" > > (my spellchecker complains, too). http://www.thefreedictionary.com/denary > -1 indeed. What's wrong with binary, octal, decimal, hexademimal? If it was logical it would be 'b', 'o', 'd', 'h', not 'b', 'o', 'd', 'x'. Why use x rather than h for hexadecimal? Because it is an established convention. Of course 'd' is an established convention too, but in the end the C standard adopted 'i' as an alternative because people _expect_ an 'i' to be there and to mean integer. (Surely it is only old C programmers who learnt C before 'i' was available use 'd' these days.) And decimal may lead people new to Python to think decimal.Decimal is intended, or at least that a decimal number (i.e., one with a fractional part) is expected. I think the right solution is to use 'i' - Integer. Outputs the number in base 10. because I think people assume base 10 for integers unless told otherwise, whereas "decimal" is ambiguous, is it a base 10 integer or a decimal floating point number. Both C and C++ accept both 'i' and 'd' (and I think accepting both is fine although that goes against TOOWTDI), but having to use 'd' somehow seems like a retrograde step reminding me of when I started programming in C many years ago---something I thought I'd escaped:-) -- Mark Summerfield, Qtrac Ltd., www.qtrac.eu From phd at phd.pp.ru Fri Oct 26 16:53:36 2007 From: phd at phd.pp.ru (Oleg Broytmann) Date: Fri, 26 Oct 2007 18:53:36 +0400 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: <200710261540.55302.mark@qtrac.eu> References: <20071026142036.GB3365@phd.pp.ru> <200710261540.55302.mark@qtrac.eu> Message-ID: <20071026145336.GA5139@phd.pp.ru> On Fri, Oct 26, 2007 at 03:40:55PM +0100, Mark Summerfield wrote: > http://www.thefreedictionary.com/denary No need to use a word I have to lookup in a dictionary when "decimal" is so widely used. The article says "decimal" is a synonym. What is the point to use an unknown synonym instead of a well-known word? Still -1 from me. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From larry at hastings.org Fri Oct 26 19:11:45 2007 From: larry at hastings.org (Larry Hastings) Date: Fri, 26 Oct 2007 10:11:45 -0700 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: <20071026145336.GA5139@phd.pp.ru> References: <20071026142036.GB3365@phd.pp.ru> <200710261540.55302.mark@qtrac.eu> <20071026145336.GA5139@phd.pp.ru> Message-ID: <47221FD1.3080802@hastings.org> Oleg Broytmann wrote: > The article says "decimal" is a synonym. What is the point to use an > unknown synonym instead of a well-known word? His point is that Python has a fixed-point number type called "Decimal", and that this will lead to confusion. I can see his point, but we all know from years of C programming that "%d" takes an int and formats it in base 10--there is no confusion about this. Indeed, I suspect describing this as "denary" would lead to far more confusion, and using the format character "d" to take a Decimal object instead of an int would lead to widespread panic and mayhem. So -0.5 from me. /larry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071026/128fdeec/attachment.htm From jimjjewett at gmail.com Fri Oct 26 21:14:56 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 26 Oct 2007 15:14:56 -0400 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: <47221FD1.3080802@hastings.org> References: <20071026142036.GB3365@phd.pp.ru> <200710261540.55302.mark@qtrac.eu> <20071026145336.GA5139@phd.pp.ru> <47221FD1.3080802@hastings.org> Message-ID: On 10/26/07, Larry Hastings wrote: > His point is that Python has a fixed-point number type called "Decimal", > and that this will lead to confusion. I can see his point, but we all know > from years of C programming that "%d" takes an int and formats it in base > 10--there is no confusion about this. Sure there is. C isn't the only language where I've used it, but I still sometimes have to look up whether 'd' is "decimal" or "double". I've found bugs in C where someone else just assumed it was "double". If it weren't for backwards compatibility, 'i' would be a much better option, and saving 'd' for an actual Decimal (which might have a decimal point) would be good. http://docs.python.org/lib/typesseq-strings.html already allows both. The question is whether repurposing 'd' would break too much. That said, I think a Decimal that happens to be an integer probably *should* print differently from an integer, because the precision is an important part of a Decimal, and won't always fall conveniently at the decimal point. -jJ From guido at python.org Fri Oct 26 21:18:13 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Oct 2007 12:18:13 -0700 Subject: [Python-3000] PEP 3137 plan of attack (stage 3) Message-ID: 2007/10/19, Guido van Rossum : > On 10/7/07, Guido van Rossum wrote: > > I'd like to make complete implementation of PEP 3137 the goal for the > > 3.0a2 release. It should be doable to do this release by the end of > > October. I don't think anything else *needs* to be done to have a > > successful a2 release. > > I'm still hopeful, though realistically we may not quite make it. Mid November sounds more like it. Below is a full updated status update; here's a short list of the tasks that remain to be done: - remove compatibility with PyString from PyUnicode - change lots of places (e.g. encoders) to return PyString instead of PyBytes - change PyString's repr() to return "b'...'" (1) - change PyBytes's repr() to return "buffer(b'...')" (1) - change parser so that b"..." returns PyString, not PyBytes (1) - rename bytes -> buffer, str8 -> bytes (1) - change PyBytes so that its str() is the same as its repr(). - change PyString so that its str() is the same as its repr(). (1) see http://bugs.python.org/issue1247 I'll be working on all of these together; they're hard to separate out. Here's the full list: > > - remove locale support from PyString > > Done. > > > - remove compatibility with PyUnicode from PyString Done. > > - remove compatibility with PyString from PyUnicode > > Not done yet. > > > - add missing methods to PyBytes (for list, see the PEP and compare to > > what's already there) > > Done (Gregory P. Smith) > > > - remove buffer API from PyUnicode > > Done. > > > - make == and != between PyBytes and PyUnicode return False instead of > > raising TypeError > > Done. > > > - make == and != between PyString and Pyunicode return False instead > > of converting > > A patch by Thomas Lee exists: http://bugs.python.org/issue1263 > However it breaks some unit tests. Done. > > - make comparisons between PyString and PyBytes work (these are > > properly ordered) > > Already works. > > > - change lots of places (e.g. encoders) to return PyString instead of PyBytes > > Not done. > > > - change indexing and iteration over PyString to return ints, not > > 1-char PyStrings > > A patch by Alexandre Vassalotti exists but breaks some unit tests: > http://bugs.python.org/issue1280 Done. > > - change PyString's repr() to return "b'...'" > > - change PyBytes's repr() to return "buffer(b'...')" > > - change parser so that b"..." returns PyString, not PyBytes > > - rename bytes -> buffer, str8 -> bytes > > A patch by Alexandre Vassolotti and Christian Heimes exists for these 4 items: > http://bugs.python.org/issue1247 > However it breaks too many tests to be applied right now. Still pending. > > - change the constructor for PyString to match the one for PyBytes > > Not done. Done. > > - change PyBytes so that its str() is the same as its repr(). > > - change PyString so that its str() is the same as its repr(). > > Not done. > > > - add an iteration view over PyBytes (optional) > > Not yet done (Christian Heimes offered). Done. > > - kill basestring. > > Done (Christian Heimes). > > > - move initialization of sys.std{in,out,err} into C code and do it earlier. > > A patch by Christian Heimes exists: http://bugs.python.org/issue1267 > However it still breaks some unit tests... Done. > There are also some issues that mainly crop up in non-English locales. > We will try to get to the bottom of those before releasing 3.0a2, but > I need help as I'm myself absolutely unable to work with locales (and > I don't have access to a Windows box). I think Christian and a few others are making progress here. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Fri Oct 26 21:18:19 2007 From: janssen at parc.com (Bill Janssen) Date: Fri, 26 Oct 2007 12:18:19 PDT Subject: [Python-3000] 3K bytes I/O? Message-ID: <07Oct26.121820pdt."57996"@synergy1.parc.xerox.com> I'm looking at the Py3K SSL code, and have a question: What's the upshot of the bytes/string decisions in the C world? Is PyString_* now all about immutable bytes, and PyUnicode_* about strings? There still seem to be a lot of encode/decode methods in stringobject.h, operations which I'd expect to be in unicodeobject.h. Bill From guido at python.org Fri Oct 26 21:26:17 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Oct 2007 12:26:17 -0700 Subject: [Python-3000] 3K bytes I/O? In-Reply-To: <-912240280709553237@unknownmsgid> References: <-912240280709553237@unknownmsgid> Message-ID: 2007/10/26, Bill Janssen : > I'm looking at the Py3K SSL code, and have a question: > > What's the upshot of the bytes/string decisions in the C world? Is > PyString_* now all about immutable bytes, and PyUnicode_* about > strings? There still seem to be a lot of encode/decode methods in > stringobject.h, operations which I'd expect to be in unicodeobject.h. I think the PyString encode/decode APIs should all go; use the corresponding PyUnicode ones. I recommend that you write your code to assume PyBytes for encoded/binary data, and PyUnicode for text; at some point we'll substitute PyString for most cases where PyBytes is currently used: that will happen once PyString is called bytes in at the Python level, and PyBytes will be called buffer. But that's still a while off. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Fri Oct 26 23:26:19 2007 From: lists at cheimes.de (Christian Heimes) Date: Fri, 26 Oct 2007 23:26:19 +0200 Subject: [Python-3000] PEP 3137 plan of attack (stage 3) In-Reply-To: References: Message-ID: <47225B7B.8020206@cheimes.de> Guido van Rossum wrote: > Mid November sounds more like it. > > Below is a full updated status update; here's a short list of the > tasks that remain to be done: > > - remove compatibility with PyString from PyUnicode > - change lots of places (e.g. encoders) to return PyString instead of PyBytes > - change PyString's repr() to return "b'...'" (1) > - change PyBytes's repr() to return "buffer(b'...')" (1) > - change parser so that b"..." returns PyString, not PyBytes (1) > - rename bytes -> buffer, str8 -> bytes (1) > - change PyBytes so that its str() is the same as its repr(). > - change PyString so that its str() is the same as its repr(). > > (1) see http://bugs.python.org/issue1247 > > I'll be working on all of these together; they're hard to separate out. I suggest that you create a branch for the transition period. It will take at least several days to kick and drag everything in place. We can work on the transition while the rest can play with a working py3k branch. >> There are also some issues that mainly crop up in non-English locales. >> We will try to get to the bottom of those before releasing 3.0a2, but >> I need help as I'm myself absolutely unable to work with locales (and >> I don't have access to a Windows box). > > I think Christian and a few others are making progress here. I think that I have found and fixed the last bit of problematic code in the time module several days ago. I don't get any locales related errors on my German Windows installation anymore. I would like to have people with other locales to test py3k on Windows. In particular I'm interested in tests with more "exotic" locales like Cyrillic alphabet (Greek, Russian), Arabian and Asian locales. Christian From guido at python.org Fri Oct 26 23:52:58 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Oct 2007 14:52:58 -0700 Subject: [Python-3000] PEP 3137 plan of attack (stage 3) In-Reply-To: <47225B7B.8020206@cheimes.de> References: <47225B7B.8020206@cheimes.de> Message-ID: 2007/10/26, Christian Heimes : > I suggest that you create a branch for the transition period. It will > take at least several days to kick and drag everything in place. We can > work on the transition while the rest can play with a working py3k branch. Thanks for the suggestion -- I'm now working in a new branch, py3k-pep3137. > >> There are also some issues that mainly crop up in non-English locales. > >> We will try to get to the bottom of those before releasing 3.0a2, but > >> I need help as I'm myself absolutely unable to work with locales (and > >> I don't have access to a Windows box). > > > > I think Christian and a few others are making progress here. > > I think that I have found and fixed the last bit of problematic code in > the time module several days ago. I don't get any locales related errors > on my German Windows installation anymore. I would like to have people > with other locales to test py3k on Windows. In particular I'm interested > in tests with more "exotic" locales like Cyrillic alphabet (Greek, > Russian), Arabian and Asian locales. Also check the 3.0 buildbots: http://www.python.org/dev/buildbot/3.0/ I still see a lot of failing tests there... E.g. http://www.python.org/dev/buildbot/3.0/x86%20XP-3%203.0/builds/191/step-test/0 -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Sat Oct 27 00:06:32 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 27 Oct 2007 11:06:32 +1300 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: References: <200710260855.52649.mark@qtrac.eu> Message-ID: <472264E8.9060205@canterbury.ac.nz> Chris Monson wrote: > 'd' - Decimal Integer. Outputs the number in base 10. > > Modern C now has > 'i' as an alternative to 'd' Considering that in printf formats the alternatives to 'd' or 'i' are 'x' for hexadecimal and 'o' for octal, then 'd' for decimal makes a lot more sense to me than 'i', which says nothing about the base in which it will be displayed. It makes even more sense in Python, where the format codes are clearly all about the display format and nothing to do with the type of data (whereas the distinction is somewhat blurred in C). > most people expect decimals to have fractional parts). Then their expectations require adjustment. "Decimal" means "base 10". On its own it doesn't imply anything about fractions. -- Greg From greg.ewing at canterbury.ac.nz Sat Oct 27 00:42:01 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 27 Oct 2007 11:42:01 +1300 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: References: <20071026142036.GB3365@phd.pp.ru> <200710261540.55302.mark@qtrac.eu> <20071026145336.GA5139@phd.pp.ru> <47221FD1.3080802@hastings.org> Message-ID: <47226D39.2020002@canterbury.ac.nz> Jim Jewett wrote: > If it weren't for backwards compatibility, 'i' would be a much better > option, No, it wouldn't, because 'integer' is a data type, not a display format. The Python format codes specify display formats, not data types. -- Greg From janssen at parc.com Sat Oct 27 01:07:26 2007 From: janssen at parc.com (Bill Janssen) Date: Fri, 26 Oct 2007 16:07:26 PDT Subject: [Python-3000] base64.{decode,encode}string Message-ID: <07Oct26.160735pdt."57996"@synergy1.parc.xerox.com> I think encodestring() should return a string, not bytes, and decodestring() should take either a string, or bytes containing an ASCII-encoded string. Otherwise, every place they'll ever be used has to wrap an additional unicode/encode step around their use. Bill From guido at python.org Sat Oct 27 01:33:02 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Oct 2007 16:33:02 -0700 Subject: [Python-3000] base64.{decode,encode}string In-Reply-To: <-1648744909719026234@unknownmsgid> References: <-1648744909719026234@unknownmsgid> Message-ID: 2007/10/26, Bill Janssen : > I think encodestring() should return a string, not bytes, and > decodestring() should take either a string, or bytes containing an > ASCII-encoded string. Otherwise, every place they'll ever be > used has to wrap an additional unicode/encode step around their > use. I'm okay with being flexible on input. I think there ought to be separate functions returning bytes and str. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Sat Oct 27 02:24:47 2007 From: janssen at parc.com (Bill Janssen) Date: Fri, 26 Oct 2007 17:24:47 PDT Subject: [Python-3000] base64.{decode,encode}string In-Reply-To: References: <-1648744909719026234@unknownmsgid> Message-ID: <07Oct26.172456pdt."57996"@synergy1.parc.xerox.com> > 2007/10/26, Bill Janssen : > > I think encodestring() should return a string, not bytes, and > > decodestring() should take either a string, or bytes containing an > > ASCII-encoded string. Otherwise, every place they'll ever be > > used has to wrap an additional unicode/encode step around their > > use. > > I'm okay with being flexible on input. I think there ought to be > separate functions returning bytes and str. I'm fine with that, too. I just think that the purpose of standard_b64encode() is to take bytes and produce text, and the purpose of standard_b64decode() is to take text and produce bytes. But if we want to add encodestring_to_ascii(), to take bytes and produce ASCII base64-encoded bytes, and decodestring_from_ascii(), to take an ASCII-encoded string as bytes, and produce bytes, that's OK with me. But it seems odd. Bill From stephen at xemacs.org Sat Oct 27 02:34:32 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 27 Oct 2007 09:34:32 +0900 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: <472264E8.9060205@canterbury.ac.nz> References: <200710260855.52649.mark@qtrac.eu> <472264E8.9060205@canterbury.ac.nz> Message-ID: <87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > > most people expect decimals to have fractional parts). > > Then their expectations require adjustment. "Decimal" > means "base 10". On its own it doesn't imply anything > about fractions. "Decimal point" notwithstanding, I guess. Getting "them" to change their expectations is a losing battle. From janssen at parc.com Sat Oct 27 02:37:15 2007 From: janssen at parc.com (Bill Janssen) Date: Fri, 26 Oct 2007 17:37:15 PDT Subject: [Python-3000] passing bytes buffers to C with NUL characters in them? Message-ID: <07Oct26.173721pdt."57996"@synergy1.parc.xerox.com> I'm not sure what to use in PyArg_ParseTuple in 3K. I'm passing in bytes which may contain NUL characters. Using 's#' doesn't really work, because it erroneously accepts Unicode strings. Bill From guido at python.org Sat Oct 27 02:42:42 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Oct 2007 17:42:42 -0700 Subject: [Python-3000] passing bytes buffers to C with NUL characters in them? In-Reply-To: <-6189415837657270969@unknownmsgid> References: <-6189415837657270969@unknownmsgid> Message-ID: 2007/10/26, Bill Janssen : > I'm not sure what to use in PyArg_ParseTuple in 3K. I'm passing in > bytes which may contain NUL characters. Using 's#' doesn't really > work, because it erroneously accepts Unicode strings. Use y# I think. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Sat Oct 27 02:42:29 2007 From: janssen at parc.com (Bill Janssen) Date: Fri, 26 Oct 2007 17:42:29 PDT Subject: [Python-3000] passing bytes buffers to C with NUL characters in them? In-Reply-To: <07Oct26.173721pdt."57996"@synergy1.parc.xerox.com> References: <07Oct26.173721pdt."57996"@synergy1.parc.xerox.com> Message-ID: <07Oct26.174236pdt."57996"@synergy1.parc.xerox.com> > I'm not sure what to use in PyArg_ParseTuple in 3K. I'm passing in > bytes which may contain NUL characters. Using 's#' doesn't really > work, because it erroneously accepts Unicode strings. Ah, sorry, found it. "y#". Bill From guido at python.org Sat Oct 27 02:44:49 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Oct 2007 17:44:49 -0700 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: <87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp> References: <200710260855.52649.mark@qtrac.eu> <472264E8.9060205@canterbury.ac.nz> <87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 2007/10/26, Stephen J. Turnbull : > Greg Ewing writes: > > > > most people expect decimals to have fractional parts). > > > > Then their expectations require adjustment. "Decimal" > > means "base 10". On its own it doesn't imply anything > > about fractions. > > "Decimal point" notwithstanding, I guess. > > Getting "them" to change their expectations is a losing battle. However, non of the participants in this discussion are "most people" and I can't recall ever hearing about Python and programming newbies who had trouble with %d. So that "their" expectations are is a matter of pure speculation. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Sat Oct 27 02:45:02 2007 From: janssen at parc.com (Bill Janssen) Date: Fri, 26 Oct 2007 17:45:02 PDT Subject: [Python-3000] plat-mac seriously broken? Message-ID: <07Oct26.174511pdt."57996"@synergy1.parc.xerox.com> I found that an SSL test was failing on 3K because of the following: Traceback (most recent call last): File "/local/python/3k/src/Lib/test/test_ssl.py", line 818, in testAsyncore f = urllib.urlopen(url) File "/local/python/3k/src/Lib/urllib.py", line 75, in urlopen opener = FancyURLopener() File "/local/python/3k/src/Lib/urllib.py", line 553, in __init__ URLopener.__init__(self, *args, **kwargs) File "/local/python/3k/src/Lib/urllib.py", line 124, in __init__ proxies = getproxies() File "/local/python/3k/src/Lib/urllib.py", line 1278, in getproxies return getproxies_environment() or getproxies_internetconfig() File "/local/python/3k/src/Lib/urllib.py", line 1263, in getproxies_internetconfig if 'UseHTTPProxy' in config and config['UseHTTPProxy']: File "/local/python/3k/src/Lib/plat-mac/ic.py", line 187, in __getitem__ return _decode(self.h.data, key) File "/local/python/3k/src/Lib/plat-mac/ic.py", line 144, in _decode return decoder(data, key) File "/local/python/3k/src/Lib/plat-mac/ic.py", line 68, in _decode_boolean return ord(data[0]) TypeError: ord() expected string of length 1, but int found All of the modules in plat-mac are full of this kind of stuff. Someone needs to run 2to3 over them, I think. Or maybe ord(int) should just return the int. Bill From janssen at parc.com Sat Oct 27 04:33:32 2007 From: janssen at parc.com (Bill Janssen) Date: Fri, 26 Oct 2007 19:33:32 PDT Subject: [Python-3000] plat-mac seriously broken? In-Reply-To: <07Oct26.174511pdt."57996"@synergy1.parc.xerox.com> References: <07Oct26.174511pdt."57996"@synergy1.parc.xerox.com> Message-ID: <07Oct26.193341pdt."57996"@synergy1.parc.xerox.com> > All of the modules in plat-mac are full of this kind of stuff. > Someone needs to run 2to3 over them, I think. Actually, after looking at the code a bit more, I think 1to3 would be more appropriate. :-) Bill From stephen at xemacs.org Sat Oct 27 06:44:02 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 27 Oct 2007 13:44:02 +0900 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: References: <200710260855.52649.mark@qtrac.eu> <472264E8.9060205@canterbury.ac.nz> <87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87d4v12mlp.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > I can't recall ever hearing about Python and programming newbies > who had trouble with %d. OK. I think Greg's basic point is correct, I just (over?)reacted to the suggestion that *if* people do have trouble, telling them to change expectations will have a useful effect. Emacs advocates do that *far* too much, and the only effect it has that I know of is to increase the ranks of vi users. (Does perl have %i?) From greg.ewing at canterbury.ac.nz Sat Oct 27 08:33:10 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 27 Oct 2007 19:33:10 +1300 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: <87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp> References: <200710260855.52649.mark@qtrac.eu> <472264E8.9060205@canterbury.ac.nz> <87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4722DBA6.4010005@canterbury.ac.nz> Stephen J. Turnbull wrote: > Greg Ewing writes: > > "Decimal" > > means "base 10". On its own it doesn't imply anything > > about fractions. > > "Decimal point" notwithstanding, I guess. That's not "decimal" on its own -- it includes the word "point", which is what tells you that you're (potentially) dealing with fractions. -- Greg From greg.ewing at canterbury.ac.nz Sat Oct 27 08:49:01 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 27 Oct 2007 19:49:01 +1300 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: <87d4v12mlp.fsf@uwakimon.sk.tsukuba.ac.jp> References: <200710260855.52649.mark@qtrac.eu> <472264E8.9060205@canterbury.ac.nz> <87sl3x4cpz.fsf@uwakimon.sk.tsukuba.ac.jp> <87d4v12mlp.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4722DF5D.80402@canterbury.ac.nz> Stephen J. Turnbull wrote: > I just (over?)reacted to > the suggestion that *if* people do have trouble, telling them to > change expectations will have a useful effect. I wasn't really suggesting that they change their expectations, only that we shouldn't use such expectations as a basis for deciding what to do. -- Greg From lists at cheimes.de Sat Oct 27 16:23:08 2007 From: lists at cheimes.de (Christian Heimes) Date: Sat, 27 Oct 2007 16:23:08 +0200 Subject: [Python-3000] PEP 3137 plan of attack (stage 3) In-Reply-To: References: Message-ID: <472349CC.4050103@cheimes.de> Guido van Rossum wrote: >> There are also some issues that mainly crop up in non-English locales. >> We will try to get to the bottom of those before releasing 3.0a2, but >> I need help as I'm myself absolutely unable to work with locales (and >> I don't have access to a Windows box). > > I think Christian and a few others are making progress here. I've hit another wall of bricks on Windows. It's not possible to run Python from a directory with non ASCII characters: http://bugs.python.org/issue1342. I've a patch that reduces the problem from a segfault to an unrecoverable import error. The remaining problem seems to lay deep in PC/getpathp.c:Py_GetPath(). It seems that it can't handle non ASCII chars correctly. The second line is a fprintf(stderr, "%s\n", char *path). Do you see the difference between "test???" and "test???"? c:\test???\PCBuild8\win32release>python c:\test???\PCBuild8\win32release\python30.zip;c:\test???\DLLs;c:\ test???\lib;c:\test???\lib\plat-win;c:\test???\lib\lib-tk;c:\test???\PCBuild8\wi n32release Fatal Python error: Py_Initialize: can't initialize sys standard streams object : ImportError('No module named encodings.utf_8',) type : ImportError refcount: 4 address : 00A43540 lost sys.stderr Christian From skip at pobox.com Sat Oct 27 20:25:46 2007 From: skip at pobox.com (skip at pobox.com) Date: Sat, 27 Oct 2007 13:25:46 -0500 Subject: [Python-3000] plat-mac seriously broken? In-Reply-To: <07Oct26.174511pdt."57996"@synergy1.parc.xerox.com> References: <07Oct26.174511pdt."57996"@synergy1.parc.xerox.com> Message-ID: <18211.33450.332197.304601@montanaro.dyndns.org> Bill> I found that an SSL test was failing on 3K because of the following: ... Bill> All of the modules in plat-mac are full of this kind of stuff. ISTR much of the plat-mac stuff was generated by Tools/bgen. If so, that would be the place to fix things. Skip From janssen at parc.com Sun Oct 28 00:03:08 2007 From: janssen at parc.com (Bill Janssen) Date: Sat, 27 Oct 2007 15:03:08 PDT Subject: [Python-3000] plat-mac seriously broken? In-Reply-To: <18211.33450.332197.304601@montanaro.dyndns.org> References: <07Oct26.174511pdt."57996"@synergy1.parc.xerox.com> <18211.33450.332197.304601@montanaro.dyndns.org> Message-ID: <07Oct27.150317pdt."57996"@synergy1.parc.xerox.com> > ISTR much of the plat-mac stuff was generated by Tools/bgen. If so, that > would be the place to fix things. Sure looks like generated code. Be nice if that generator was run during the build process, on OS X. That way you'd be sure to get code that matches the platform and codebase. Bill From janssen at parc.com Sun Oct 28 00:27:08 2007 From: janssen at parc.com (Bill Janssen) Date: Sat, 27 Oct 2007 15:27:08 PDT Subject: [Python-3000] Odd output from test -- buffering bug? Message-ID: <07Oct27.152712pdt."57996"@synergy1.parc.xerox.com> I'm seeing a sort of odd thing going on when running one of my tests. I'm seeing two lines of output, from two different threads, being duplicated when I run with "regrtest -u all -v test_ssl". This is with the latest 3K sources on PPC OS X 10.4.10. testSTARTTLS (test.test_ssl.ThreadedTests) ... client: sending b'msg 1'... ^@client: sending b'msg 1'... server: new connection from ('127.0.0.1', 52371) server: new connection from ('127.0.0.1', 52371) This is output to an Emacs shell buffer, so it shows control characters in the output, and I'm seeing a NUL character being output there at the beginning of the third line. Both of the duplicated lines are being output with code like this: if test_support.verbose: sys.stdout.write( " client: sending %s...\n" % repr(msg)) This looks like some kind of buffering bug. Is it in the test harness, or the standard I/O library? Bill From janssen at parc.com Sun Oct 28 02:11:06 2007 From: janssen at parc.com (Bill Janssen) Date: Sat, 27 Oct 2007 18:11:06 PDT Subject: [Python-3000] bug in i/o module buffering? Message-ID: <07Oct27.181107pdt."57996"@synergy1.parc.xerox.com> In the following, 'n' is equal to 0 (read from a non-blocking socket). Is this a bug in the I/O module buffering? Bill Traceback (most recent call last): File "/local/python/3k/src/Lib/SocketServer.py", line 222, in handle_request self.process_request(request, client_address) File "/local/python/3k/src/Lib/SocketServer.py", line 241, in process_request self.finish_request(request, client_address) File "/local/python/3k/src/Lib/SocketServer.py", line 254, in finish_request self.RequestHandlerClass(request, client_address, self) File "/local/python/3k/src/Lib/SocketServer.py", line 522, in __init__ self.handle() File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 330, in handle self.handle_one_request() File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 313, in handle_one_request self.raw_requestline = self.rfile.readline() File "/local/python/3k/src/Lib/io.py", line 391, in readline b = self.read(nreadahead()) File "/local/python/3k/src/Lib/io.py", line 377, in nreadahead readahead = self.peek(1, unsafe=True) File "/local/python/3k/src/Lib/io.py", line 778, in peek current = self.raw.read(to_read) File "/local/python/3k/src/Lib/io.py", line 455, in read del b[n:] TypeError: 'slice' object does not support item deletion ---------------------------------------- From guido at python.org Sun Oct 28 02:21:21 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 27 Oct 2007 18:21:21 -0700 Subject: [Python-3000] bug in i/o module buffering? In-Reply-To: <8566285166171308234@unknownmsgid> References: <8566285166171308234@unknownmsgid> Message-ID: More interesting is, what's b? 2007/10/27, Bill Janssen : > In the following, 'n' is equal to 0 (read from a non-blocking socket). > Is this a bug in the I/O module buffering? > > Bill > > Traceback (most recent call last): > File "/local/python/3k/src/Lib/SocketServer.py", line 222, in handle_request > self.process_request(request, client_address) > File "/local/python/3k/src/Lib/SocketServer.py", line 241, in process_request > self.finish_request(request, client_address) > File "/local/python/3k/src/Lib/SocketServer.py", line 254, in finish_request > self.RequestHandlerClass(request, client_address, self) > File "/local/python/3k/src/Lib/SocketServer.py", line 522, in __init__ > self.handle() > File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 330, in handle > self.handle_one_request() > File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 313, in handle_one_request > self.raw_requestline = self.rfile.readline() > File "/local/python/3k/src/Lib/io.py", line 391, in readline > b = self.read(nreadahead()) > File "/local/python/3k/src/Lib/io.py", line 377, in nreadahead > readahead = self.peek(1, unsafe=True) > File "/local/python/3k/src/Lib/io.py", line 778, in peek > current = self.raw.read(to_read) > File "/local/python/3k/src/Lib/io.py", line 455, in read > del b[n:] > TypeError: 'slice' object does not support item deletion > ---------------------------------------- > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Oct 28 02:22:32 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 27 Oct 2007 18:22:32 -0700 Subject: [Python-3000] Odd output from test -- buffering bug? In-Reply-To: <7173684078305279630@unknownmsgid> References: <7173684078305279630@unknownmsgid> Message-ID: Hard to say. Never seen this before. Are you using fork() *anywhere* in your tests (not necessarily the affected test)? 2007/10/27, Bill Janssen : > I'm seeing a sort of odd thing going on when running one of my tests. > I'm seeing two lines of output, from two different threads, being > duplicated when I run with "regrtest -u all -v test_ssl". This is > with the latest 3K sources on PPC OS X 10.4.10. > > testSTARTTLS (test.test_ssl.ThreadedTests) ... > client: sending b'msg 1'... > ^@client: sending b'msg 1'... > server: new connection from ('127.0.0.1', 52371) > server: new connection from ('127.0.0.1', 52371) > > This is output to an Emacs shell buffer, so it shows control > characters in the output, and I'm seeing a NUL character being output > there at the beginning of the third line. Both of the duplicated lines > are being output with code like this: > > if test_support.verbose: > sys.stdout.write( > " client: sending %s...\n" % repr(msg)) > > This looks like some kind of buffering bug. Is it in the test > harness, or the standard I/O library? > > Bill > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Sun Oct 28 02:24:04 2007 From: janssen at parc.com (Bill Janssen) Date: Sat, 27 Oct 2007 18:24:04 PDT Subject: [Python-3000] bad socket close in httplib.py Message-ID: <07Oct27.182413pdt."57996"@synergy1.parc.xerox.com> I think the socket close in HTTPConnection.close() is incorrect, but is being hidden by the delayed closing implemented in socket.py. See issue 1348. Bill From janssen at parc.com Sun Oct 28 02:27:29 2007 From: janssen at parc.com (Bill Janssen) Date: Sat, 27 Oct 2007 18:27:29 PDT Subject: [Python-3000] Odd output from test -- buffering bug? In-Reply-To: References: <7173684078305279630@unknownmsgid> Message-ID: <07Oct27.182738pdt."57996"@synergy1.parc.xerox.com> No, not unless the test harness uses it. But there are two threads. > Hard to say. Never seen this before. Are you using fork() *anywhere* > in your tests (not necessarily the affected test)? > > 2007/10/27, Bill Janssen : > > I'm seeing a sort of odd thing going on when running one of my tests. > > I'm seeing two lines of output, from two different threads, being > > duplicated when I run with "regrtest -u all -v test_ssl". This is > > with the latest 3K sources on PPC OS X 10.4.10. > > > > testSTARTTLS (test.test_ssl.ThreadedTests) ... > > client: sending b'msg 1'... > > ^@client: sending b'msg 1'... > > server: new connection from ('127.0.0.1', 52371) > > server: new connection from ('127.0.0.1', 52371) > > > > This is output to an Emacs shell buffer, so it shows control > > characters in the output, and I'm seeing a NUL character being output > > there at the beginning of the third line. Both of the duplicated lines > > are being output with code like this: > > > > if test_support.verbose: > > sys.stdout.write( > > " client: sending %s...\n" % repr(msg)) > > > > This looks like some kind of buffering bug. Is it in the test > > harness, or the standard I/O library? > > > > Bill > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Sun Oct 28 02:30:30 2007 From: janssen at parc.com (Bill Janssen) Date: Sat, 27 Oct 2007 18:30:30 PDT Subject: [Python-3000] bug in i/o module buffering? In-Reply-To: References: <8566285166171308234@unknownmsgid> Message-ID: <07Oct27.183033pdt."57996"@synergy1.parc.xerox.com> >From RawIOBase.read(). What's __index__() do? b = bytes(n.__index__()) > More interesting is, what's b? > > 2007/10/27, Bill Janssen : > > In the following, 'n' is equal to 0 (read from a non-blocking socket). > > Is this a bug in the I/O module buffering? > > > > Bill > > > > Traceback (most recent call last): > > File "/local/python/3k/src/Lib/SocketServer.py", line 222, in handle_request > > self.process_request(request, client_address) > > File "/local/python/3k/src/Lib/SocketServer.py", line 241, in process_request > > self.finish_request(request, client_address) > > File "/local/python/3k/src/Lib/SocketServer.py", line 254, in finish_request > > self.RequestHandlerClass(request, client_address, self) > > File "/local/python/3k/src/Lib/SocketServer.py", line 522, in __init__ > > self.handle() > > File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 330, in handle > > self.handle_one_request() > > File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 313, in handle_one_request > > self.raw_requestline = self.rfile.readline() > > File "/local/python/3k/src/Lib/io.py", line 391, in readline > > b = self.read(nreadahead()) > > File "/local/python/3k/src/Lib/io.py", line 377, in nreadahead > > readahead = self.peek(1, unsafe=True) > > File "/local/python/3k/src/Lib/io.py", line 778, in peek > > current = self.raw.read(to_read) > > File "/local/python/3k/src/Lib/io.py", line 455, in read > > del b[n:] > > TypeError: 'slice' object does not support item deletion > > ---------------------------------------- > > > > _______________________________________________ > > Python-3000 mailing list > > Python-3000 at python.org > > http://mail.python.org/mailman/listinfo/python-3000 > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Sun Oct 28 18:27:57 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 28 Oct 2007 13:27:57 -0400 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: <47226D39.2020002@canterbury.ac.nz> References: <20071026142036.GB3365@phd.pp.ru> <200710261540.55302.mark@qtrac.eu> <20071026145336.GA5139@phd.pp.ru> <47221FD1.3080802@hastings.org> <47226D39.2020002@canterbury.ac.nz> Message-ID: On 10/26/07, Greg Ewing wrote: > Jim Jewett wrote: > > If it weren't for backwards compatibility, 'i' would be a much better > > option, > No, it wouldn't, because 'integer' is a data type, not > a display format. The Python format codes specify display > formats, not data types. I think that distinction is splitting hairs. (1) Even to a programmer, there may not be much difference between "%f" prints it as a float and "%f" means to convert it to a float and print that (If anything, the docs support the second definition.) (2) To most people, all numbers are base-10, and using another base is just a silly affectation, like pig-latin. Decimal doesn't mean "base 10", it means "has a decimal point", and contrasts with both fractions and integers. Programmers have typically been exceptions, but I'm not sure how true that will remain in the future. Octal is already a wart that causes more bugs that it prevents. Hex is still useful. In another half-generation, I'm not so sure. It is *probably* too early to drop support for %d as "Signed integer decimal" rather than "Decimal". But I believe the docs would already be improved by changing the definition table at http://docs.python.org/lib/typesseq-strings.html from d Signed integer decimal. i Signed integer decimal. to d Signed integer decimal. Currently an alias for i. i Signed integer decimal. -jJ From jimjjewett at gmail.com Sun Oct 28 18:36:21 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 28 Oct 2007 13:36:21 -0400 Subject: [Python-3000] PEP 3137 plan of attack (stage 3) In-Reply-To: <472349CC.4050103@cheimes.de> References: <472349CC.4050103@cheimes.de> Message-ID: On 10/27/07, Christian Heimes wrote: > Guido van Rossum wrote: > The second line is a fprintf(stderr, "%s\n", char *path). > Do you see the > difference between "test???" and "test???"? One likely difference is that test??? should be a legitimate (unicode) Python name, but test??? probably isn't, because the division sign isn't alphanumeric. Also, there is a chance that test??? was already in the appropriate normalized form, but test??? probably isn't, because of the superscript. Whether either of these *should* matter in this case, I couldn't tell from your post. -jJ -jJ From jimjjewett at gmail.com Sun Oct 28 18:45:36 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 28 Oct 2007 13:45:36 -0400 Subject: [Python-3000] bug in i/o module buffering? In-Reply-To: <8035548431694532893@unknownmsgid> References: <8566285166171308234@unknownmsgid> <8035548431694532893@unknownmsgid> Message-ID: On 10/27/07, Bill Janssen wrote: > > > File "/local/python/3k/src/Lib/io.py", line 455, in read > > > del b[n:] > > > TypeError: 'slice' object does not support item deletion > b = bytes(n.__index__()) Isn't bytes the *im*mutable bytestring, so that you would need a buffer (rather than a bytes) if you plan to clear it out? -jJ From lists at cheimes.de Sun Oct 28 18:54:14 2007 From: lists at cheimes.de (Christian Heimes) Date: Sun, 28 Oct 2007 18:54:14 +0100 Subject: [Python-3000] PEP 3137 plan of attack (stage 3) In-Reply-To: References: <472349CC.4050103@cheimes.de> Message-ID: <4724CCC6.3080705@cheimes.de> Jim Jewett wrote: > One likely difference is that test??? should be a legitimate (unicode) > Python name, but test??? probably isn't, because the division sign > isn't alphanumeric. > > Also, there is a chance that test??? was already in the appropriate > normalized form, but test??? probably isn't, because of the > superscript. > > Whether either of these *should* matter in this case, I couldn't tell > from your post. I'm neither a Windows expert nor an experienced Windows developer. I'm just guessing here. Could it be that Python is using the char* NameA API methods instead of the wide wchar_t * NameW methods? Christian From lists at cheimes.de Sun Oct 28 19:27:05 2007 From: lists at cheimes.de (Christian Heimes) Date: Sun, 28 Oct 2007 19:27:05 +0100 Subject: [Python-3000] bug in i/o module buffering? In-Reply-To: References: <8566285166171308234@unknownmsgid> <8035548431694532893@unknownmsgid> Message-ID: Jim Jewett wrote: > Isn't bytes the *im*mutable bytestring, so that you would need a > buffer (rather than a bytes) if you plan to clear it out? The types aren't renamed yet. Bytes is still the mutable bytestring and str8 the immutable. Christian From janssen at parc.com Sun Oct 28 19:28:14 2007 From: janssen at parc.com (Bill Janssen) Date: Sun, 28 Oct 2007 11:28:14 PDT Subject: [Python-3000] bug in i/o module buffering? In-Reply-To: References: <8566285166171308234@unknownmsgid> <8035548431694532893@unknownmsgid> Message-ID: <07Oct28.102824pst."57996"@synergy1.parc.xerox.com> Jim Jewett wrote: > On 10/27/07, Bill Janssen wrote: > > > > > File "/local/python/3k/src/Lib/io.py", line 455, in read > > > > del b[n:] > > > > TypeError: 'slice' object does not support item deletion > > > b = bytes(n.__index__()) > > Isn't bytes the *im*mutable bytestring, so that you would need a > buffer (rather than a bytes) if you plan to clear it out? I think when this code was written, "bytes" was mutable (that's why it couldn't be a key in a dict). If I understand the grand plan correctly, "bytes" will become "buffer" (mutable), and "str8" will become "bytes" (immutable). Bill From janssen at parc.com Sun Oct 28 20:09:56 2007 From: janssen at parc.com (Bill Janssen) Date: Sun, 28 Oct 2007 12:09:56 PDT Subject: [Python-3000] socket GC worries Message-ID: <07Oct28.111004pst."57996"@synergy1.parc.xerox.com> I've now got a working SSL patch for Py3K (assuming that the patches for #1347 and #1349 are committed), but I'm a bit worried about the lazy GC of sockets. I find that simply dropping an SSLSocket on the floor doesn't GC the C structures. This implies that the instance in the SSLSocket._sslobj slot never gets decref'ed. I think it's due to the fact that "socket.makefile()" creates a circular reference with an instance of "socket.SocketCloser", which points to the socket, and the socket has a slot which points to the "_closer". If "socket.close()" is never explicitly called, the underlying system socket never gets closed. Since sockets are bound to a scarce system resource, this could be problematic. I think that the SocketCloser (new in Py3K) was developed to address another issue, which is that there's a lot of library code which assumes that the Python socket instance is just window dressing over an underlying system file descriptor, and isn't important. In fact, that whole mess of code is a good argument for *not* exposing the fileno in Python (perhaps only for special cases, like "select"). Take httplib and urllib, for instance. HTTPConnection creates a "file" from the socket, by calling socket.makefile(), then in some cases *closes* the socket (thereby reasonably rendering the socket *dead*), *then* returns the "file" to the caller as part of the response. urllib then takes the response, pulls the "file" out of it, and discards the rest, returning the "file" as part of an instance of addinfourl. Somewhere along the way some code should call "close()" on that HTTPConnection socket, but not till the caller is finished using the bytes of the response (and those bytes are kept queued up in the real OS socket). Ideally, GC of the response instance should call close() on the socket instance, which means that the instance should be passed along as part of the response, IMO. Bill From python3now at gmail.com Sun Oct 28 21:05:43 2007 From: python3now at gmail.com (James Thiele) Date: Sun, 28 Oct 2007 13:05:43 -0700 Subject: [Python-3000] __bool__ in 2.6? Message-ID: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com> PEP 361 lists __bool__ support as being possible for 2.6 backporting. As of today the trunk build uses __nonzero__ like 2.5 but 3.0 alpha uses __bool__. Has a decision been made on whether this will make the cut for 2.6? In a more general vein, is there a cutoff date for producing a list of 3.0 features which will be backported to 2.6? Thanks, James From qrczak at knm.org.pl Sun Oct 28 21:19:40 2007 From: qrczak at knm.org.pl (Marcin =?UTF-8?Q?=E2=80=98Qrczak=E2=80=99?= Kowalczyk) Date: Sun, 28 Oct 2007 21:19:40 +0100 Subject: [Python-3000] PEP 3137 plan of attack (stage 3) In-Reply-To: <472349CC.4050103@cheimes.de> References: <472349CC.4050103@cheimes.de> Message-ID: <1193602780.17694.2.camel@qrnik> Dnia 27-10-2007, So o godzinie 16:23 +0200, Christian Heimes pisze: > The second line is a fprintf(stderr, "%s\n", char *path). Do you see the > difference between "test???" and "test???"? "???" in CP-1252 has the same bytes as "???" in CP-850, so this is some confusion between ANSI and OEM codepages. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From brett at python.org Sun Oct 28 22:23:54 2007 From: brett at python.org (Brett Cannon) Date: Sun, 28 Oct 2007 14:23:54 -0700 Subject: [Python-3000] __bool__ in 2.6? In-Reply-To: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com> References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com> Message-ID: On 10/28/07, James Thiele wrote: > PEP 361 lists __bool__ support as being possible for 2.6 backporting. > As of today the trunk build uses __nonzero__ like 2.5 but 3.0 alpha > uses __bool__. Has a decision been made on whether this will make the > cut for 2.6? > > In a more general vein, is there a cutoff date for producing a list of > 3.0 features which will be backported to 2.6? Backporting decisions have not been made as the feature set of 3.0 is still a moving target. Once we nail down the features (I am going to guess not until b1 at the earliest) then backporting will probably start. -Brett From greg.ewing at canterbury.ac.nz Sun Oct 28 22:49:18 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 29 Oct 2007 10:49:18 +1300 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: References: <20071026142036.GB3365@phd.pp.ru> <200710261540.55302.mark@qtrac.eu> <20071026145336.GA5139@phd.pp.ru> <47221FD1.3080802@hastings.org> <47226D39.2020002@canterbury.ac.nz> Message-ID: <472503DE.5080608@canterbury.ac.nz> Jim Jewett wrote: > Decimal doesn't mean "base 10", it means "has a decimal point" According to dictionary.com, it means 1. pertaining to tenths or to the number 10. 2. proceeding by tens: a decimal system. -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 28 22:56:55 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 29 Oct 2007 10:56:55 +1300 Subject: [Python-3000] socket GC worries In-Reply-To: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> Message-ID: <472505A7.108@canterbury.ac.nz> Bill Janssen wrote: > that whole mess of code is a good argument for *not* exposing the > fileno in Python Seems to me that a socket should already *be* a file, so it shouldn't need a makefile() method and you shouldn't have to mess around with filenos. -- Greg From janssen at parc.com Mon Oct 29 00:36:42 2007 From: janssen at parc.com (Bill Janssen) Date: Sun, 28 Oct 2007 16:36:42 PDT Subject: [Python-3000] socket GC worries In-Reply-To: <472505A7.108@canterbury.ac.nz> References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> Message-ID: <07Oct28.153644pst."57996"@synergy1.parc.xerox.com> > Bill Janssen wrote: > > that whole mess of code is a good argument for *not* exposing the > > fileno in Python > > Seems to me that a socket should already *be* a file, > so it shouldn't need a makefile() method and you > shouldn't have to mess around with filenos. I like that model, too. I also wish the classes in io.py were sort of inverted; that is, I'd like to have an IOStream base class with read() and write() methods (and maybe close()), which things like Socket could inherit from. FileIO would inherit from IOStream and from Seekable, and add a fileno() method and "name" property. And so forth. But apparently that's out; maybe in Python 4000. Right now the socket is very much like an OS socket; with "send" and "recv" being the star players, not "read" and "write". socket.makefile wraps a buffered file-like interface around it. Bill From jimjjewett at gmail.com Mon Oct 29 00:51:48 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 28 Oct 2007 19:51:48 -0400 Subject: [Python-3000] PEP 3101 suggested corrections In-Reply-To: <472503DE.5080608@canterbury.ac.nz> References: <20071026142036.GB3365@phd.pp.ru> <200710261540.55302.mark@qtrac.eu> <20071026145336.GA5139@phd.pp.ru> <47221FD1.3080802@hastings.org> <47226D39.2020002@canterbury.ac.nz> <472503DE.5080608@canterbury.ac.nz> Message-ID: On 10/28/07, Greg Ewing wrote: > Jim Jewett wrote: > > Decimal doesn't mean "base 10", it means "has a decimal point" > According to dictionary.com, it means I see that I wasn't clear about this still being within the scope of "To most people ..." The dictionary gives a correct definition -- but realistically, that definition is jargon, rather than the way most people I've talked to actually use it. When I asked my kids what they were studying in math, the answer was sometimes "decimals" -- and this was always after plenty of work with multiple-digit arithmetic, but years before they learned about alternate bases. -jJ From guido at python.org Mon Oct 29 18:41:31 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 10:41:31 -0700 Subject: [Python-3000] plat-mac seriously broken? In-Reply-To: <1209807056282906541@unknownmsgid> References: <18211.33450.332197.304601@montanaro.dyndns.org> <1209807056282906541@unknownmsgid> Message-ID: 2007/10/27, Bill Janssen : > > ISTR much of the plat-mac stuff was generated by Tools/bgen. If so, that > > would be the place to fix things. > > Sure looks like generated code. Be nice if that generator was run > during the build process, on OS X. That way you'd be sure to get code > that matches the platform and codebase. ISTR that the generator needs a lot of hand-holding. Fixing it would be A Project. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Oct 29 18:48:05 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 10:48:05 -0700 Subject: [Python-3000] Odd output from test -- buffering bug? In-Reply-To: <9003268132624447768@unknownmsgid> References: <7173684078305279630@unknownmsgid> <9003268132624447768@unknownmsgid> Message-ID: Thinking about this some more, the io module isn't thread-safe. It probably should be (the old file objects were more-or-less thread-safe, although I believe there might've been corner cases if one thread were to close a file). --Guido 2007/10/27, Bill Janssen : > No, not unless the test harness uses it. But there are two threads. > > > Hard to say. Never seen this before. Are you using fork() *anywhere* > > in your tests (not necessarily the affected test)? > > > > 2007/10/27, Bill Janssen : > > > I'm seeing a sort of odd thing going on when running one of my tests. > > > I'm seeing two lines of output, from two different threads, being > > > duplicated when I run with "regrtest -u all -v test_ssl". This is > > > with the latest 3K sources on PPC OS X 10.4.10. > > > > > > testSTARTTLS (test.test_ssl.ThreadedTests) ... > > > client: sending b'msg 1'... > > > ^@client: sending b'msg 1'... > > > server: new connection from ('127.0.0.1', 52371) > > > server: new connection from ('127.0.0.1', 52371) > > > > > > This is output to an Emacs shell buffer, so it shows control > > > characters in the output, and I'm seeing a NUL character being output > > > there at the beginning of the third line. Both of the duplicated lines > > > are being output with code like this: > > > > > > if test_support.verbose: > > > sys.stdout.write( > > > " client: sending %s...\n" % repr(msg)) > > > > > > This looks like some kind of buffering bug. Is it in the test > > > harness, or the standard I/O library? > > > > > > Bill > > > > > > _______________________________________________ > > > Python-3000 mailing list > > > Python-3000 at python.org > > > http://mail.python.org/mailman/listinfo/python-3000 > > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Oct 29 19:07:21 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 11:07:21 -0700 Subject: [Python-3000] bug in i/o module buffering? In-Reply-To: <-2917254322633780664@unknownmsgid> References: <8566285166171308234@unknownmsgid> <-2917254322633780664@unknownmsgid> Message-ID: __index__() converts an "int-like" object to an int. This is needed to make sure that e.g. numpy integral scalars can be used for indexing. For a regular int it doesn't matter, so here it's a red herring. I'm asking about b because the error message "TypeError: 'slice' object does not support item deletion" would suggest that b is a slice object. I agree that doesn't sound very likely given the code though... :-( Could you step through this using pdb and investigate some more? Perhaps there's a refcount error somewhere in the C code? --Guido 2007/10/27, Bill Janssen : > From RawIOBase.read(). What's __index__() do? > > b = bytes(n.__index__()) > > > More interesting is, what's b? > > > > 2007/10/27, Bill Janssen : > > > In the following, 'n' is equal to 0 (read from a non-blocking socket). > > > Is this a bug in the I/O module buffering? > > > > > > Bill > > > > > > Traceback (most recent call last): > > > File "/local/python/3k/src/Lib/SocketServer.py", line 222, in handle_request > > > self.process_request(request, client_address) > > > File "/local/python/3k/src/Lib/SocketServer.py", line 241, in process_request > > > self.finish_request(request, client_address) > > > File "/local/python/3k/src/Lib/SocketServer.py", line 254, in finish_request > > > self.RequestHandlerClass(request, client_address, self) > > > File "/local/python/3k/src/Lib/SocketServer.py", line 522, in __init__ > > > self.handle() > > > File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 330, in handle > > > self.handle_one_request() > > > File "/local/python/3k/src/Lib/BaseHTTPServer.py", line 313, in handle_one_request > > > self.raw_requestline = self.rfile.readline() > > > File "/local/python/3k/src/Lib/io.py", line 391, in readline > > > b = self.read(nreadahead()) > > > File "/local/python/3k/src/Lib/io.py", line 377, in nreadahead > > > readahead = self.peek(1, unsafe=True) > > > File "/local/python/3k/src/Lib/io.py", line 778, in peek > > > current = self.raw.read(to_read) > > > File "/local/python/3k/src/Lib/io.py", line 455, in read > > > del b[n:] > > > TypeError: 'slice' object does not support item deletion > > > ---------------------------------------- > > > > > > _______________________________________________ > > > Python-3000 mailing list > > > Python-3000 at python.org > > > http://mail.python.org/mailman/listinfo/python-3000 > > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > > > > > > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Oct 29 19:10:31 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 11:10:31 -0700 Subject: [Python-3000] __bool__ in 2.6? In-Reply-To: References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com> Message-ID: 2007/10/28, Brett Cannon : > On 10/28/07, James Thiele wrote: > > PEP 361 lists __bool__ support as being possible for 2.6 backporting. > > As of today the trunk build uses __nonzero__ like 2.5 but 3.0 alpha > > uses __bool__. Has a decision been made on whether this will make the > > cut for 2.6? > > > > In a more general vein, is there a cutoff date for producing a list of > > 3.0 features which will be backported to 2.6? > > Backporting decisions have not been made as the feature set of 3.0 is > still a moving target. Once we nail down the features (I am going to > guess not until b1 at the earliest) then backporting will probably > start. In this case, like many, the backport can't be an exact copy of the 3.0 code: 2.6 *must* support __nonzero__. But it should also support __bool__ as a fallback. I think it would be great if someone submited a patch to implement this (though it isn't necessarily the highest backporting priority). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From dwheeler at dwheeler.com Mon Oct 29 19:34:03 2007 From: dwheeler at dwheeler.com (David A. Wheeler) Date: Mon, 29 Oct 2007 14:34:03 -0400 (EDT) Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: <47169E6E.7000804@canterbury.ac.nz> References: <47169E6E.7000804@canterbury.ac.nz> Message-ID: I think several postings have explained better than I have on why __cmp__ is still very valuable. (See below.) Guido van Rossum posted earlier that he was willing to entertain a PEP to restore __cmp__, so I've attempted to create a draft PEP, posted here: http://www.dwheeler.com/misc/pep-cmp.txt Please let me know if it makes sense. Thanks. Greg Ewing stated "Why not provide a __richcmp__ method that directly connects with the corresponding type slot? All the comparisons eventually end up there anyway, so it seems like the right place to provide a one-stop comparison method in the 3.0 age." It _seems_ to me that this is the same as "__cmp__", and if so, let's just keep using the same name (there's nothing wrong with the name!). But maybe I just don't understand the comment, so explanation welcome. --- David A. Wheeler ======================================== Aahz: >From my perspective, the real use case for cmp() is when you want to do >a three-way comparison of a "large" object (for example, a Decimal >instance). You can store the result of cmp() and then do a separate >three-way branch. and reply to the note "I'm having troubles coming up with things where the *basic* operator is really a cmp-like function.", there were two replies.. Guido van Rossum: >Here's one. When implementing the '<' operator on lists or tuples, you > really want to call the 'cmp' operator on the individual items, > because otherwise (if all you have is == and <) the algorithm becomes > something like "compare for equality until you've found the first pair > of items that are unequal; then compare those items again using < to > decide the final outcome". If you don't believe this, try to implement > this operation using only == or < without comparing any two items more > than once. and Greg Ewing: > Think of things like comparing a tuple. You need to work your > way along and recursively compare the elements. The decision > about when to stop always involves ==, whatever comparison > you're trying to do. So if e.g. you're doing <, then you have > to test each element first for <, and if that's false, test > it for ==. If the element is itself a tuple, it's doing this > on its elements too, etc., and things get very inefficient. > > If you have a single cmp operation that you can apply to the > elements, you only need to do it once for each element and it > gives you all the information you need. From python3now at gmail.com Mon Oct 29 19:40:19 2007 From: python3now at gmail.com (James Thiele) Date: Mon, 29 Oct 2007 11:40:19 -0700 Subject: [Python-3000] __bool__ in 2.6? In-Reply-To: References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com> Message-ID: <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com> So just to clarify: 2.5 __nonzero__ only 2.6 __nonzero__ first, then __bool__ (if patch submitted) 3.x __bool__ first, then __nonzero__ Is this correct? On 10/29/07, Guido van Rossum wrote: > 2007/10/28, Brett Cannon : > > On 10/28/07, James Thiele wrote: > > > PEP 361 lists __bool__ support as being possible for 2.6 backporting. > > > As of today the trunk build uses __nonzero__ like 2.5 but 3.0 alpha > > > uses __bool__. Has a decision been made on whether this will make the > > > cut for 2.6? > > > > > > In a more general vein, is there a cutoff date for producing a list of > > > 3.0 features which will be backported to 2.6? > > > > Backporting decisions have not been made as the feature set of 3.0 is > > still a moving target. Once we nail down the features (I am going to > > guess not until b1 at the earliest) then backporting will probably > > start. > > In this case, like many, the backport can't be an exact copy of the > 3.0 code: 2.6 *must* support __nonzero__. But it should also support > __bool__ as a fallback. I think it would be great if someone submited > a patch to implement this (though it isn't necessarily the highest > backporting priority). > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > From fdrake at acm.org Mon Oct 29 19:44:55 2007 From: fdrake at acm.org (Fred Drake) Date: Mon, 29 Oct 2007 14:44:55 -0400 Subject: [Python-3000] __bool__ in 2.6? In-Reply-To: <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com> References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com> <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com> Message-ID: <0AF8BC56-2C4B-47BF-AE48-49140805FE97@acm.org> On Oct 29, 2007, at 2:40 PM, James Thiele wrote: > So just to clarify: > 2.6 __nonzero__ first, then __bool__ (if patch submitted) > 3.x __bool__ first, then __nonzero__ I'd expect switching the order for this to be a bug magnet. I'd much rather see: 2.5 __nonzero__ only 2.6 __bool__ first, then __nonzero__ (if patch submitted) 3.x __bool__ first, then __nonzero__ The fewer variations there are in the algorithm, the better. -Fred -- Fred Drake From guido at python.org Mon Oct 29 19:45:55 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 11:45:55 -0700 Subject: [Python-3000] socket GC worries In-Reply-To: <6186646035112263762@unknownmsgid> References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> Message-ID: 2007/10/28, Bill Janssen : > > Bill Janssen wrote: > > > that whole mess of code is a good argument for *not* exposing the > > > fileno in Python > > > > Seems to me that a socket should already *be* a file, > > so it shouldn't need a makefile() method and you > > shouldn't have to mess around with filenos. That model fits TCP/IP streams just fine, but doesn't work so well for UDP and other odd socket types. The assumption that "s.write(a); s.write(b) is equivalent to s.write(a+b)", which is fundamental for any "stream" abstraction, just doesn't work for UDP. Ditto for reading: AFAIK recv() truncates the rest of an UDP packet. > I like that model, too. I also wish the classes in io.py were sort of > inverted; that is, I'd like to have an IOStream base class with read() > and write() methods (and maybe close()), which things like Socket > could inherit from. FileIO would inherit from IOStream and from > Seekable, and add a fileno() method and "name" property. And so > forth. But apparently that's out; maybe in Python 4000. Actually, I'm still up for tweaks to the I/O model if it solves a real problem, as long as most of the high-level APIs stay the same (there simply is too much code that expects those to behave a certain way). I don't quite understand what you mean by inverted though. > Right now the socket is very much like an OS socket; with "send" and > "recv" being the star players, not "read" and "write". socket.makefile > wraps a buffered file-like interface around it. I was going to say "we can just replace SocketIO with a non-seekable _fileio.FileIO instance" until I realized that on Windows, socket fds and filesystem fds live in different spaces and are managed using different calls. That may also explain why the inversion you're looking for doesn't quite work (IIUC what you meant). The real issue seems to be file descriptor GC. Maybe we haven't written down the rules clearly enough for when the fd is supposed to be GC'ed, when there are both a socket and a SocketIO (or more) referencing it; and whether a close() call means something beyond dropping the last reference to the object. Or maybe we haven't implemented the rules right? ISTM that the SocketCloser class is *intended* to solve these issues. Back to your initial mail (which is more relevant than Greg Ewing's snipe!): > I think that the SocketCloser (new in Py3K) was developed to address > another issue, which is that there's a lot of library code which > assumes that the Python socket instance is just window dressing over > an underlying system file descriptor, and isn't important. In fact, > that whole mess of code is a good argument for *not* exposing the > fileno in Python (perhaps only for special cases, like "select"). > Take httplib and urllib, for instance. HTTPConnection creates a > "file" from the socket, by calling socket.makefile(), then in some > cases *closes* the socket (thereby reasonably rendering the socket > *dead*), *then* returns the "file" to the caller as part of the > response. urllib then takes the response, pulls the "file" out of it, > and discards the rest, returning the "file" as part of an instance of > addinfourl. Somewhere along the way some code should call "close()" > on that HTTPConnection socket, but not till the caller is finished > using the bytes of the response (and those bytes are kept queued up in > the real OS socket). Ideally, GC of the response instance should call > close() on the socket instance, which means that the instance should > be passed along as part of the response, IMO. Hm, I think you're right. The SocketCloser class wasn't written with the SSL use case in mind. :-( I wonder if one key to solving the problem isn't to make the socket *wrap* a low-level _socket instance instead of *being* one (i.e. containment instead of subclassing). Then the SSL code could be passed the low-level _socket instance and the high(er)-level socket class could wrap either a _socket or an SSL instance. The SocketCloser would then be responsible for closing whatever the socket instance wraps, i.e. either the _socket or the SSL instance. Then we could have any number of SocketIO instances *plus* at most one socket instance, and the wrapped thing would be closed when the last of the higher-level things was either GC'ed or explicitly closed. If you wanted to reuse the _socket after closing the SSL instance, you'd have to wrap it in a fresh socket instance. Does that make sense? (Please do note the difference throughout between _socket and socket, the former being defined in socketmodule.c and the latter in socket.py.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Oct 29 19:50:50 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 11:50:50 -0700 Subject: [Python-3000] __bool__ in 2.6? In-Reply-To: <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com> References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com> <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com> Message-ID: 2007/10/29, James Thiele : > So just to clarify: > 2.5 __nonzero__ only > 2.6 __nonzero__ first, then __bool__ (if patch submitted) > 3.x __bool__ first, then __nonzero__ > > Is this correct? No. 3.x tests __bool__ only. > On 10/29/07, Guido van Rossum wrote: > > 2007/10/28, Brett Cannon : > > > On 10/28/07, James Thiele wrote: > > > > PEP 361 lists __bool__ support as being possible for 2.6 backporting. > > > > As of today the trunk build uses __nonzero__ like 2.5 but 3.0 alpha > > > > uses __bool__. Has a decision been made on whether this will make the > > > > cut for 2.6? > > > > > > > > In a more general vein, is there a cutoff date for producing a list of > > > > 3.0 features which will be backported to 2.6? > > > > > > Backporting decisions have not been made as the feature set of 3.0 is > > > still a moving target. Once we nail down the features (I am going to > > > guess not until b1 at the earliest) then backporting will probably > > > start. > > > > In this case, like many, the backport can't be an exact copy of the > > 3.0 code: 2.6 *must* support __nonzero__. But it should also support > > __bool__ as a fallback. I think it would be great if someone submited > > a patch to implement this (though it isn't necessarily the highest > > backporting priority). > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Oct 29 19:51:23 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 11:51:23 -0700 Subject: [Python-3000] __bool__ in 2.6? In-Reply-To: <0AF8BC56-2C4B-47BF-AE48-49140805FE97@acm.org> References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com> <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com> <0AF8BC56-2C4B-47BF-AE48-49140805FE97@acm.org> Message-ID: 2007/10/29, Fred Drake : > On Oct 29, 2007, at 2:40 PM, James Thiele wrote: > > So just to clarify: > > 2.6 __nonzero__ first, then __bool__ (if patch submitted) > > 3.x __bool__ first, then __nonzero__ > > I'd expect switching the order for this to be a bug magnet. I'd much > rather see: > > 2.5 __nonzero__ only > 2.6 __bool__ first, then __nonzero__ (if patch submitted) > 3.x __bool__ first, then __nonzero__ > > The fewer variations there are in the algorithm, the better. Makes sense, if you change the 3.x rule to 3.x __bool__ only. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Mon Oct 29 19:57:53 2007 From: fdrake at acm.org (Fred Drake) Date: Mon, 29 Oct 2007 14:57:53 -0400 Subject: [Python-3000] __bool__ in 2.6? In-Reply-To: References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com> <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com> <0AF8BC56-2C4B-47BF-AE48-49140805FE97@acm.org> Message-ID: <6242C601-C31D-4347-975E-DE18AF3D9062@acm.org> On Oct 29, 2007, at 2:51 PM, Guido van Rossum wrote: > Makes sense, if you change the 3.x rule to > > 3.x __bool__ only. Even better! I think I'm going to like 3.0 if I ever get a chance to use it. ;-) -Fred -- Fred Drake From janssen at parc.com Mon Oct 29 19:59:26 2007 From: janssen at parc.com (Bill Janssen) Date: Mon, 29 Oct 2007 11:59:26 PDT Subject: [Python-3000] bug in i/o module buffering? In-Reply-To: References: <8566285166171308234@unknownmsgid> <-2917254322633780664@unknownmsgid> Message-ID: <07Oct29.105930pst."57996"@synergy1.parc.xerox.com> > I'm asking about b because the error message "TypeError: 'slice' > object does not support item deletion" would suggest that b is a slice > object. I agree that doesn't sound very likely given the code > though... :-( Could you step through this using pdb and investigate > some more? Perhaps there's a refcount error somewhere in the C code? I'll see if I can unfix my test code to reproduce the failure :-). Bill From janssen at parc.com Mon Oct 29 20:24:36 2007 From: janssen at parc.com (Bill Janssen) Date: Mon, 29 Oct 2007 12:24:36 PDT Subject: [Python-3000] socket GC worries In-Reply-To: References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> Message-ID: <07Oct29.112445pst."57996"@synergy1.parc.xerox.com> > The SocketCloser class wasn't written with > the SSL use case in mind. I don't think it's just SSL. The problem is that it explicitly counts calls to "close()". So if you let the GC sweep up after you, that close() just doesn't get called, the circular refs persist, and the resource doesn't get collected till the backup GC runs (if it does). Waiting for that to happen, you might run out of a scarce system resource (file descriptors). A nasty timing-dependent bug, there. Hmmm, does real_close even get called in that case? In the C module, perhaps? > If you wanted to reuse the _socket after closing > the SSL instance, you'd have to wrap it in a fresh socket instance. > > Does that make sense? (Please do note the difference throughout > between _socket and socket, the former being defined in socketmodule.c > and the latter in socket.py.) That's what I do with SSLSocket, pretty much. I worry that doing it with socket.socket might break a lot of non-TCP code, though. And perhaps it's overkill. Why not move the count of how many SocketIO instances are pointing to it into the socket.socket class again, as it was in 2.x? I don't think you're gaining anything with the circular data structure of SocketCloser. Add a "_closed" property, and "__del__" method to socket.socket (which calls "close()"). Remove SocketCloser. You're finished, and there's one less class to maintain. And, ref your other comments, why not call SocketIO "TCPStream"? It would make things much clearer. Also, is it too late to rename socket.socket to "socket.Socket"? There are only a handful of references to "socket.socket" outside of the socket and ssl modules. Bill From rhamph at gmail.com Mon Oct 29 20:49:53 2007 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 29 Oct 2007 13:49:53 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47169E6E.7000804@canterbury.ac.nz> Message-ID: On 10/29/07, David A. Wheeler wrote: > I think several postings have explained better than I have on why __cmp__ is still very valuable. (See below.) > > Guido van Rossum posted earlier that he was willing to entertain a PEP to restore __cmp__, so I've attempted to create a draft PEP, posted here: > http://www.dwheeler.com/misc/pep-cmp.txt > Please let me know if it makes sense. Thanks. > > Greg Ewing stated "Why not provide a __richcmp__ method that directly connects > with the corresponding type slot? All the comparisons eventually end up there anyway, so it seems like the right place to provide a one-stop comparison method in the 3.0 age." > It _seems_ to me that this is the same as "__cmp__", and if so, let's just keep using the same name (there's nothing wrong with the name!). But maybe I just don't understand the comment, so explanation welcome. I believe the intent was for __richcmp__ to take an argument indicating what sort of comparison is to be done (as tp_richcompare does in C.) ie, you'd write code like this: def __richcmp__(self, other, op): if !isinstance(other, MyType): return NotImplemented return richcmp(self.foo, other.foo, op) Short-circuiting of equality checks (due to identity or interning) would work right. Likewise, there's no odd behaviour with comparable-but-unorderable types. It's not clear to me how many distinct operations you'd need though, or how acceptable reflections would be. Would only two operations, equality and ordering, be sufficient? Just what are the non-symmetric use cases the current design caters to? > > > > --- David A. Wheeler > > ======================================== > > Aahz: > >From my perspective, the real use case for cmp() is when you want to do > >a three-way comparison of a "large" object (for example, a Decimal > >instance). You can store the result of cmp() and then do a separate > >three-way branch. > > and reply to the note "I'm having troubles coming up with things where > the *basic* operator is really a cmp-like function.", there were two replies.. > > > Guido van Rossum: > >Here's one. When implementing the '<' operator on lists or tuples, you > > really want to call the 'cmp' operator on the individual items, > > because otherwise (if all you have is == and <) the algorithm becomes > > something like "compare for equality until you've found the first pair > > of items that are unequal; then compare those items again using < to > > decide the final outcome". If you don't believe this, try to implement > > this operation using only == or < without comparing any two items more > > than once. > > and > > Greg Ewing: > > Think of things like comparing a tuple. You need to work your > > way along and recursively compare the elements. The decision > > about when to stop always involves ==, whatever comparison > > you're trying to do. So if e.g. you're doing <, then you have > > to test each element first for <, and if that's false, test > > it for ==. If the element is itself a tuple, it's doing this > > on its elements too, etc., and things get very inefficient. > > > > If you have a single cmp operation that you can apply to the > > elements, you only need to do it once for each element and it > > gives you all the information you need. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/rhamph%40gmail.com > -- Adam Olsen, aka Rhamphoryncus From janssen at parc.com Mon Oct 29 20:46:14 2007 From: janssen at parc.com (Bill Janssen) Date: Mon, 29 Oct 2007 12:46:14 PDT Subject: [Python-3000] socket GC worries In-Reply-To: References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> Message-ID: <07Oct29.114621pst."57996"@synergy1.parc.xerox.com> > Actually, I'm still up for tweaks to the I/O model if it solves a real > problem, as long as most of the high-level APIs stay the same (there > simply is too much code that expects those to behave a certain way). > > I don't quite understand what you mean by inverted though. I'm actually thinking more in terms of avoiding future problems. I thought we'd discussed this a few months ago, but here it is again: I'd break up the BaseIO class into a small set of base classes, so that we can be more explicit about what a particular kind of I/O channel is or is not: (Please excuse typos, I'm generating this off-the-cuff -- does @abstract actually exist?) ------------------------------------------------------------- class IOStream: @abstract def close(self): @property def closed(self): class InputIOStream (IOStream): @abstract def read(self, buffer=None, nbytes=None): class OutputIOStream (IOStream): @abstract def write(self, bytes): @abstract def flush(self): class SeekableIOStream (IOStream): @abstract def tell(self): @abstract def seek(self): @abstract def truncate(self): class SystemIOStream (IOStream): @property def fileno(self): @property def isatty (self): class TextInputStream (InputIOStream): @abstract def readline(self): @abstract def readlines(self): class TextOutputStream (InputIOStream): @abstract def readline(self): @abstract def readlines(self): class FileStream (SystemIOStream, SeekableIOStream): @property name @property mode # note that open() would return FileStream mixed with one or both of # {Text}InputStream and {Text}OutputStream, depending on the "mode". class StringIO (SeekableIOStream): # again, mixed with IO modes, depending on "mode". ------------------------------------------------------------- I think of this as inverted, because it puts primitives like "read" and "write" at the lowest layers, not above things like "fileno" or "truncate", which are very specialized and should only apply to a subset of I/O channels. I realize that there are some practical problems with this; such as making _fileio.FileIO inherit from (multiple) Python base classes. Bill From steven.bethard at gmail.com Mon Oct 29 21:04:56 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Mon, 29 Oct 2007 14:04:56 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47169E6E.7000804@canterbury.ac.nz> Message-ID: On 10/29/07, David A. Wheeler wrote: > I think several postings have explained better than I have on why __cmp__ is still very valuable. (See below.) > > Guido van Rossum posted earlier that he was willing to entertain a PEP to restore __cmp__, so I've attempted to create a draft PEP, posted here: > http://www.dwheeler.com/misc/pep-cmp.txt > Please let me know if it makes sense. Thanks. I think the PEP's a little misleading in that it makes it sound like defining __lt__, __gt__, etc. is inefficient. I think you want to be explicit about where __lt__, __gt__ are efficient, and where __cmp__ is efficient. For example:: * __lt__ is more efficient for sorting (given the current implementation) * __cmp__ is more efficient for comparing sequences like tuples, where you always need to check for equality first, and you don't want to have to do an == check followed by a < check if you can do them both at the same time. (This is basically the same argument as for Decimal -- why do two comparisons when you can do one?) STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From guido at python.org Mon Oct 29 21:26:16 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 13:26:16 -0700 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47169E6E.7000804@canterbury.ac.nz> Message-ID: I'm a bit too busy to look into this right now; I hope one or two more rounds of feedback on the PEP will get it into a state where I can review it more easily. Having a patch to go with it would be immensely helpful as well (in fact I'd say that without a patch it's unlikely to happen). --Guido 2007/10/29, David A. Wheeler : > I think several postings have explained better than I have on why __cmp__ is still very valuable. (See below.) > > Guido van Rossum posted earlier that he was willing to entertain a PEP to restore __cmp__, so I've attempted to create a draft PEP, posted here: > http://www.dwheeler.com/misc/pep-cmp.txt > Please let me know if it makes sense. Thanks. > > Greg Ewing stated "Why not provide a __richcmp__ method that directly connects > with the corresponding type slot? All the comparisons eventually end up there anyway, so it seems like the right place to provide a one-stop comparison method in the 3.0 age." > It _seems_ to me that this is the same as "__cmp__", and if so, let's just keep using the same name (there's nothing wrong with the name!). But maybe I just don't understand the comment, so explanation welcome. > > > > --- David A. Wheeler > > ======================================== > > Aahz: > >From my perspective, the real use case for cmp() is when you want to do > >a three-way comparison of a "large" object (for example, a Decimal > >instance). You can store the result of cmp() and then do a separate > >three-way branch. > > and reply to the note "I'm having troubles coming up with things where > the *basic* operator is really a cmp-like function.", there were two replies.. > > > Guido van Rossum: > >Here's one. When implementing the '<' operator on lists or tuples, you > > really want to call the 'cmp' operator on the individual items, > > because otherwise (if all you have is == and <) the algorithm becomes > > something like "compare for equality until you've found the first pair > > of items that are unequal; then compare those items again using < to > > decide the final outcome". If you don't believe this, try to implement > > this operation using only == or < without comparing any two items more > > than once. > > and > > Greg Ewing: > > Think of things like comparing a tuple. You need to work your > > way along and recursively compare the elements. The decision > > about when to stop always involves ==, whatever comparison > > you're trying to do. So if e.g. you're doing <, then you have > > to test each element first for <, and if that's false, test > > it for ==. If the element is itself a tuple, it's doing this > > on its elements too, etc., and things get very inefficient. > > > > If you have a single cmp operation that you can apply to the > > elements, you only need to do it once for each element and it > > gives you all the information you need. > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Oct 29 21:32:14 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 13:32:14 -0700 Subject: [Python-3000] socket GC worries In-Reply-To: <-5143302779702104898@unknownmsgid> References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> <-5143302779702104898@unknownmsgid> Message-ID: 2007/10/29, Bill Janssen : > > The SocketCloser class wasn't written with > > the SSL use case in mind. > > I don't think it's just SSL. The problem is that it explicitly counts > calls to "close()". So if you let the GC sweep up after you, that > close() just doesn't get called, the circular refs persist, and the > resource doesn't get collected till the backup GC runs (if it does). > Waiting for that to happen, you might run out of a scarce system > resource (file descriptors). A nasty timing-dependent bug, there. Ouch. Unfortunately adding a __del__() method that calls close() won't really help, as the cyclic GC refuses to do anything with objects having a __del__. This needs more thinking than I have time for right now, but i agree we need to fix it. > Hmmm, does real_close even get called in that case? In the C module, > perhaps? The C module will certainly close the fd when the object goes away. The question is, is that soon enough. > > If you wanted to reuse the _socket after closing > > the SSL instance, you'd have to wrap it in a fresh socket instance. > > > > Does that make sense? (Please do note the difference throughout > > between _socket and socket, the former being defined in socketmodule.c > > and the latter in socket.py.) > > That's what I do with SSLSocket, pretty much. I worry that doing it > with socket.socket might break a lot of non-TCP code, though. And > perhaps it's overkill. > > Why not move the count of how many SocketIO instances are pointing to > it into the socket.socket class again, as it was in 2.x? I don't > think you're gaining anything with the circular data structure of > SocketCloser. Add a "_closed" property, and "__del__" method to > socket.socket (which calls "close()"). Remove SocketCloser. You're > finished, and there's one less class to maintain. I'll look into this later. > And, ref your other comments, why not call SocketIO "TCPStream"? It > would make things much clearer. Good idea; SocketIO made more sense when it was part of io.py. > Also, is it too late to rename socket.socket to "socket.Socket"? > There are only a handful of references to "socket.socket" outside of > the socket and ssl modules. Really? AFAIK everyone who opens a socket calls it. I'd be okay with calling the class Socket and having a factory function named socket though. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Oct 29 21:38:19 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 13:38:19 -0700 Subject: [Python-3000] socket GC worries In-Reply-To: <3470699275677683430@unknownmsgid> References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> <3470699275677683430@unknownmsgid> Message-ID: 2007/10/29, Bill Janssen : > > Actually, I'm still up for tweaks to the I/O model if it solves a real > > problem, as long as most of the high-level APIs stay the same (there > > simply is too much code that expects those to behave a certain way). > > > > I don't quite understand what you mean by inverted though. > > I'm actually thinking more in terms of avoiding future problems. Can you remind me of what future problems again? > I thought we'd discussed this a few months ago, but here it is again: > > I'd break up the BaseIO class into a small set of base classes, so that > we can be more explicit about what a particular kind of I/O channel is > or is not: I see, static type checks in favor of dynamic behavior checks -- e.g. isinstance(s, SeekableIOStream) rather than s.seekable(). If that's all, I guess I already expressed earlier I don't really like that -- in practice I think the dynamic checks are more flexible, and the class hierarchy you're proposing is hard to implement in C (where unfortunately I'm restricted to single inheritance). E.g. depending on how a program is invoked, sys.stdin may be seekable or it may not be. --Guido > (Please excuse typos, I'm generating this off-the-cuff -- does > @abstract actually exist?) > > ------------------------------------------------------------- > > class IOStream: > > @abstract > def close(self): > > @property > def closed(self): > > class InputIOStream (IOStream): > > @abstract > def read(self, buffer=None, nbytes=None): > > class OutputIOStream (IOStream): > > @abstract > def write(self, bytes): > > @abstract > def flush(self): > > class SeekableIOStream (IOStream): > > @abstract > def tell(self): > > @abstract > def seek(self): > > @abstract > def truncate(self): > > class SystemIOStream (IOStream): > > @property > def fileno(self): > > @property > def isatty (self): > > class TextInputStream (InputIOStream): > > @abstract > def readline(self): > > @abstract > def readlines(self): > > class TextOutputStream (InputIOStream): > > @abstract > def readline(self): > > @abstract > def readlines(self): > > class FileStream (SystemIOStream, SeekableIOStream): > > @property > name > > @property > mode > > # note that open() would return FileStream mixed with one or both of > # {Text}InputStream and {Text}OutputStream, depending on the "mode". > > class StringIO (SeekableIOStream): > > # again, mixed with IO modes, depending on "mode". > > ------------------------------------------------------------- > > I think of this as inverted, because it puts primitives like "read" > and "write" at the lowest layers, not above things like "fileno" or > "truncate", which are very specialized and should only apply to a > subset of I/O channels. > > I realize that there are some practical problems with this; such as > making _fileio.FileIO inherit from (multiple) Python base classes. > > Bill > > > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at krypto.org Mon Oct 29 23:26:45 2007 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 29 Oct 2007 15:26:45 -0700 Subject: [Python-3000] 3K bytes I/O? In-Reply-To: References: <-912240280709553237@unknownmsgid> Message-ID: <52dc1c820710291526t3b176560o9745dae1b1198dc1@mail.gmail.com> And for non-unicode inputs the code should use the PEP 3118 buffer API rather than PyBytes_ or PyString_ or whatnot. On 10/26/07, Guido van Rossum wrote: > > 2007/10/26, Bill Janssen : > > I'm looking at the Py3K SSL code, and have a question: > > > > What's the upshot of the bytes/string decisions in the C world? Is > > PyString_* now all about immutable bytes, and PyUnicode_* about > > strings? There still seem to be a lot of encode/decode methods in > > stringobject.h, operations which I'd expect to be in unicodeobject.h. > > I think the PyString encode/decode APIs should all go; use the > corresponding PyUnicode ones. > > I recommend that you write your code to assume PyBytes for > encoded/binary data, and PyUnicode for text; at some point we'll > substitute PyString for most cases where PyBytes is currently used: > that will happen once PyString is called bytes in at the Python level, > and PyBytes will be called buffer. But that's still a while off. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/greg%40krypto.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071029/5b861cbb/attachment.htm From janssen at parc.com Mon Oct 29 23:48:02 2007 From: janssen at parc.com (Bill Janssen) Date: Mon, 29 Oct 2007 15:48:02 PDT Subject: [Python-3000] socket GC worries In-Reply-To: References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> <-5143302779702104898@unknownmsgid> Message-ID: <07Oct29.144811pst."57996"@synergy1.parc.xerox.com> > Really? AFAIK everyone who opens a socket calls it. Sorry, I meant only a handful (10?) in the standard library. > I'd be okay with calling the class Socket and having a factory > function named socket though. Ah, good idea. Bill From greg.ewing at canterbury.ac.nz Tue Oct 30 00:30:36 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2007 12:30:36 +1300 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47169E6E.7000804@canterbury.ac.nz> Message-ID: <47266D1C.30302@canterbury.ac.nz> David A. Wheeler wrote: > Greg Ewing stated "Why not provide a __richcmp__ method that directly connects > with the corresponding type slot? > It _seems_ to me that this is the same as "__cmp__", No, it's not -- a __richcmp__ method would take an extra argument specifying which of the six comparison operations to perform, and return a boolean instead of -1, 0, 1. Giving it the same name as the old __cmp__ would be confusing, I think. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 30 00:32:35 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2007 12:32:35 +1300 Subject: [Python-3000] __bool__ in 2.6? In-Reply-To: <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com> References: <8f01efd00710281305u61af6754p84a7cb01ec84469c@mail.gmail.com> <8f01efd00710291140j5ba095f8v144aeac14141227d@mail.gmail.com> Message-ID: <47266D93.7050407@canterbury.ac.nz> James Thiele wrote: > 3.x __bool__ first, then __nonzero__ Does 3.x need __nonzero__ at all? -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 30 00:58:03 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2007 12:58:03 +1300 Subject: [Python-3000] socket GC worries In-Reply-To: References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> Message-ID: <4726738B.2080106@canterbury.ac.nz> I wrote: > Seems to me that a socket should already *be* a file, > so it shouldn't need a makefile() method and you > shouldn't have to mess around with filenos. Guido van Rossum wrote: > That model fits TCP/IP streams just fine, but doesn't work so well for > UDP and other odd socket types. No, but I think that a socket should have read() and write() methods that work if it happens to be a socket of an appropriate kind. Unix lets you use read and write as synonyms for send and recv on stream sockets, and it's surprising that Python doesn't do the same. At the very least, it should be possible to wrap any of the higher-level I/O stack objects around a stream socket directly. > The real issue seems to be file descriptor GC. Maybe we haven't > written down the rules clearly enough for when the fd is supposed to > be GC'ed I don't see what's so difficult about this. Each file descriptor should be owned by exactly one object. If two objects need to share a fd, then you dup() it so that each one has its own fd. When the object is close()d or GCed, it closes its fd. However, I don't see that it should be necessary for objects to share fds in the first place. Buffering layers should wrap directly around the the object being buffered, whether a file or socket or something else. Then whether the socket has a fd or not is an implementation detail of the socket object, so there's no problem on Windows. Bill Janssen wrote: > Back to your initial mail (which is > more relevant than Greg Ewing's snipe!): What snipe? I'm trying to make a constructive suggestion. > then in some > cases *closes* the socket (thereby reasonably rendering the socket > *dead*), *then* returns the "file" to the caller as part of the > response. I don't understand that. What good can returning a *closed* file object possibly do anyone? -- Greg From guido at python.org Tue Oct 30 01:05:41 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 17:05:41 -0700 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: <47266D1C.30302@canterbury.ac.nz> References: <47169E6E.7000804@canterbury.ac.nz> <47266D1C.30302@canterbury.ac.nz> Message-ID: 2007/10/29, Greg Ewing : > David A. Wheeler wrote: > > Greg Ewing stated "Why not provide a __richcmp__ method that > > directly connects with the corresponding type slot? > > > It _seems_ to me that this is the same as "__cmp__", > > No, it's not -- a __richcmp__ method would take an extra > argument specifying which of the six comparison operations > to perform, and return a boolean instead of -1, 0, 1. Eh? Shouldn't it return True, False or NotImplemented if that's the interface? > Giving it the same name as the old __cmp__ would be > confusing, I think. For sure. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Tue Oct 30 01:08:39 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2007 13:08:39 +1300 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47169E6E.7000804@canterbury.ac.nz> Message-ID: <47267607.2000806@canterbury.ac.nz> Adam Olsen wrote: > It's not clear to me how many distinct operations you'd need though, > or how acceptable reflections would be. My intention was just to directly expose the tp_richcmp slot, so there would be six. To make things easier in the common case, there could perhaps be a utility function that would take a comparison operation code and a -1, 0, 1 value and return the appropriate boolean. Then a __richcmp__ method could be written very similarly to the way a __cmp__ method is now. It might even be possible for 2to3 to convert __cmp__ methods to __richcmp__ methods automatically. -- Greg From guido at python.org Tue Oct 30 01:15:27 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 17:15:27 -0700 Subject: [Python-3000] socket GC worries In-Reply-To: <4726738B.2080106@canterbury.ac.nz> References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> <4726738B.2080106@canterbury.ac.nz> Message-ID: 2007/10/29, Greg Ewing : > I wrote: > > > Seems to me that a socket should already *be* a file, > > so it shouldn't need a makefile() method and you > > shouldn't have to mess around with filenos. > > Guido van Rossum wrote: > > > That model fits TCP/IP streams just fine, but doesn't work so well for > > UDP and other odd socket types. > > No, but I think that a socket should have read() and > write() methods that work if it happens to be a socket > of an appropriate kind. Unix lets you use read and write > as synonyms for send and recv on stream sockets, and > it's surprising that Python doesn't do the same. That's because I don't find the synonyms a good idea. > At the very least, it should be possible to wrap > any of the higher-level I/O stack objects around a > stream socket directly. Why? What problem does this solve? > > The real issue seems to be file descriptor GC. Maybe we haven't > > written down the rules clearly enough for when the fd is supposed to > > be GC'ed > > I don't see what's so difficult about this. Each file > descriptor should be owned by exactly one object. If > two objects need to share a fd, then you dup() it so > that each one has its own fd. When the object is > close()d or GCed, it closes its fd. On Windows you can't dup() a fd. > However, I don't see that it should be necessary for > objects to share fds in the first place. Buffering > layers should wrap directly around the the object > being buffered, whether a file or socket or something > else. Then whether the socket has a fd or not is > an implementation detail of the socket object, so > there's no problem on Windows. There's a tension though between using GC and explicit closing. A fairly nice model would be that the lowest-level object "owns" the fd and is the one to close it when it is GC'ed. However for various reasons we don't want to rely on GC to close fds, since that may delay closing in Jython and when there happens to be an innocent reference keeping the lowest-level socket object alive (e.g. someone still has it in their stack frame or traceback). So we end up having to implement a second reference counting scheme on top of close() calls. Which is what we did. But now just dropping the last reference to an object doesn't call close(), so explicit closes suddenly become mandatory instead of recommended good practice. Adding __del__ as an alias for close might help, except this makes circular references a primary sin (since the cycle GC doesn't like calling __del__). I guess there really is no way around this solution though, and we'll just have to make extra sure not to create cycles during normal usage patterns, or use weak references in those cases where we can't avoid them. I think this is the way to go, together with changing the Socket class from subclassing _socket to wrapping one. --Guido > Bill Janssen wrote: > > > Back to your initial mail (which is > > more relevant than Greg Ewing's snipe!): > > What snipe? I'm trying to make a constructive suggestion. > > > then in some > > cases *closes* the socket (thereby reasonably rendering the socket > > *dead*), *then* returns the "file" to the caller as part of the > > response. > > I don't understand that. What good can returning a *closed* file > object possibly do anyone? > > -- > Greg > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Tue Oct 30 01:48:53 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2007 13:48:53 +1300 Subject: [Python-3000] socket GC worries In-Reply-To: References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> <4726738B.2080106@canterbury.ac.nz> Message-ID: <47267F75.7040404@canterbury.ac.nz> Guido van Rossum wrote: > That's because I don't find the synonyms a good idea. Even if it means that stream sockets then have the same interface as all other stream-like objects in the I/O system, so buffering layers can be used on them, etc.? That seems like a rather good reason to me. If you want to be pedantic about not having synonyms, then fix send() and recv() so that they only work on *non*-stream sockets, or have different classes for stream and non-stream sockets. In other words, to my mind, for stream sockets it's send and recv that are synonyms for read and write, not the other way around. > On Windows you can't dup() a fd. Oh, blarg. Forget that part, then. But I still think it shouldn't be necessary to share fds between different objects in the first place. This is the problem that would be solved by making sockets have an interface that is directly usable by higher layers of the I/O system. There would be no need to reach down below the socket object and grab its fd, so the socket would have complete ownership of it, and it would get closed when the socket object eventually went away. This would happen at the C level, so cycles and __del__ methods wouldn't be a serious problem. -- Greg From guido at python.org Tue Oct 30 01:58:56 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2007 17:58:56 -0700 Subject: [Python-3000] socket GC worries In-Reply-To: <47267F75.7040404@canterbury.ac.nz> References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> <4726738B.2080106@canterbury.ac.nz> <47267F75.7040404@canterbury.ac.nz> Message-ID: 2007/10/29, Greg Ewing : > Guido van Rossum wrote: > > > That's because I don't find the synonyms a good idea. > > Even if it means that stream sockets then have the > same interface as all other stream-like objects in > the I/O system, so buffering layers can be used on > them, etc.? That seems like a rather good reason to > me. > > If you want to be pedantic about not having synonyms, > then fix send() and recv() so that they only work > on *non*-stream sockets, or have different classes > for stream and non-stream sockets. > > In other words, to my mind, for stream sockets it's > send and recv that are synonyms for read and write, > not the other way around. > > > On Windows you can't dup() a fd. > > Oh, blarg. Forget that part, then. > > But I still think it shouldn't be necessary to share > fds between different objects in the first place. > > This is the problem that would be solved by making > sockets have an interface that is directly usable by > higher layers of the I/O system. There would be no > need to reach down below the socket object and grab > its fd, so the socket would have complete ownership > of it, and it would get closed when the socket > object eventually went away. This would happen at > the C level, so cycles and __del__ methods wouldn't > be a serious problem. Having the SocketIO wrapper works just as well. I agree we need some refactoring to deal with the ownership issue better, but having read() and write() methods on the _socket object is not the solution. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg.ewing at canterbury.ac.nz Tue Oct 30 02:07:18 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2007 14:07:18 +1300 Subject: [Python-3000] socket GC worries In-Reply-To: References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> <4726738B.2080106@canterbury.ac.nz> <47267F75.7040404@canterbury.ac.nz> Message-ID: <472683C6.9090405@canterbury.ac.nz> Guido van Rossum wrote: > having read() > and write() methods on the _socket object is not the solution. It's not a necessary part of the solution, I agree. I just don't see what purpose is served by requiring an extra layer of wrapper between a socket and the other I/O layers. That's not a necessary part of the solution either. -- Greg From rhamph at gmail.com Tue Oct 30 02:36:14 2007 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 29 Oct 2007 19:36:14 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: <47267607.2000806@canterbury.ac.nz> References: <47169E6E.7000804@canterbury.ac.nz> <47267607.2000806@canterbury.ac.nz> Message-ID: On 10/29/07, Greg Ewing wrote: > Adam Olsen wrote: > > It's not clear to me how many distinct operations you'd need though, > > or how acceptable reflections would be. > > My intention was just to directly expose the tp_richcmp > slot, so there would be six. > > To make things easier in the common case, there could > perhaps be a utility function that would take a comparison > operation code and a -1, 0, 1 value and return the > appropriate boolean. Then a __richcmp__ method could be > written very similarly to the way a __cmp__ method is > now. It might even be possible for 2to3 to convert > __cmp__ methods to __richcmp__ methods automatically. It'd be simpler still if we only had __cmp__ and __eq__. I just don't understand the use cases where that's not sufficient. Hrm. I guess set's subset checking requires more relationships than __cmp__ provides. Abandoning that feature probably isn't an option, so nevermind me. (Although, if we really wanted we could use -2/+2 to mean subset/superset, while -1/+1 mean smaller/larger.) -- Adam Olsen, aka Rhamphoryncus From jyasskin at gmail.com Tue Oct 30 06:19:43 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Mon, 29 Oct 2007 22:19:43 -0700 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47169E6E.7000804@canterbury.ac.nz> Message-ID: <5d44f72f0710292219h11a28c2dk4c7540bc5bd824e4@mail.gmail.com> On 10/29/07, Steven Bethard wrote: > On 10/29/07, David A. Wheeler wrote: > > I think several postings have explained better than I have on why __cmp__ is still very valuable. (See below.) > > > > Guido van Rossum posted earlier that he was willing to entertain a PEP to restore __cmp__, so I've attempted to create a draft PEP, posted here: > > http://www.dwheeler.com/misc/pep-cmp.txt > > Please let me know if it makes sense. Thanks. > > I think the PEP's a little misleading in that it makes it sound like > defining __lt__, __gt__, etc. is inefficient. I think you want to be > explicit about where __lt__, __gt__ are efficient, and where __cmp__ > is efficient. For example:: > > * __lt__ is more efficient for sorting (given the current implementation) > * __cmp__ is more efficient for comparing sequences like tuples, where > you always need to check for equality first, and you don't want to > have to do an == check followed by a < check if you can do them both > at the same time. (This is basically the same argument as for Decimal > -- why do two comparisons when you can do one?) When implementing a large, totally ordered object (>=2 fields), both __lt__ and __cmp__ should probably be implemented by calling __cmp__ on the fields. If you decide to implement __lt__ by letting it forward to __cmp__, the cutoff might be at 3 fields. Partial orders (what the PEP calls "asymmetric classes") cannot, of course, be implemented with __cmp__ and should have it return NotImplemented. Well, if we wanted to diverge from most other languages, we could extend __cmp__ to let it return a distinguished "Unordered" value, which returns false on all comparisons with 0. This is similar to Fortress's approach, which returns one of 4 values from a PartialOrder's CMP operator: EqualTo, LessThan, GreaterThan, and Unordered. Haskell has only a total ordering class in the core libraries, while Scala has a PartiallyOrdered trait that returns None from its compare method for unordered values. For Python, I think I favor reviving __cmp__ for totally ordered types, and asking that partially ordered ones return NotImplemented from it explicitly. Jeffrey From janssen at parc.com Tue Oct 30 17:20:11 2007 From: janssen at parc.com (Bill Janssen) Date: Tue, 30 Oct 2007 09:20:11 PDT Subject: [Python-3000] socket GC worries In-Reply-To: <4726738B.2080106@canterbury.ac.nz> References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> <4726738B.2080106@canterbury.ac.nz> Message-ID: <07Oct30.082017pst."57996"@synergy1.parc.xerox.com> > Bill Janssen wrote: > > > Back to your initial mail (which is > > more relevant than Greg Ewing's snipe!): Actually, Bill Janssen didn't write that, but did write this: > > then in some > > cases *closes* the socket (thereby reasonably rendering the socket > > *dead*), *then* returns the "file" to the caller as part of the > > response. > > I don't understand that. What good can returning a *closed* file > object possibly do anyone? Indeed. The httplib code is relying on the fact that close(), under certain circumstances, has no effect. It's just that the circumstances have changed, in Python 3K. I think that the close() in HTTPConnection should be removed. Bill From guido at python.org Tue Oct 30 17:31:07 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Oct 2007 09:31:07 -0700 Subject: [Python-3000] socket GC worries In-Reply-To: <274486914162998601@unknownmsgid> References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> <4726738B.2080106@canterbury.ac.nz> <274486914162998601@unknownmsgid> Message-ID: 2007/10/30, Bill Janssen : > Indeed. The httplib code is relying on the fact that close(), under > certain circumstances, has no effect. It's just that the > circumstances have changed, in Python 3K. I think that the close() in > HTTPConnection should be removed. I'd like to have an opinion, but this is not my code and there don't seem to be enough unittests to make sure that removing that close() doesn't break anything. I'd love to work on this more in-depth but it'll have to wait until after PEP 3137 is finished. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Tue Oct 30 18:37:29 2007 From: janssen at parc.com (Bill Janssen) Date: Tue, 30 Oct 2007 10:37:29 PDT Subject: [Python-3000] socket GC worries In-Reply-To: References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> <-5143302779702104898@unknownmsgid> Message-ID: <07Oct30.093737pst."57996"@synergy1.parc.xerox.com> > > I don't think it's just SSL. The problem is that it explicitly counts > > calls to "close()". So if you let the GC sweep up after you, that > > close() just doesn't get called, the circular refs persist, and the > > resource doesn't get collected till the backup GC runs (if it does). > > Waiting for that to happen, you might run out of a scarce system > > resource (file descriptors). A nasty timing-dependent bug, there. > > Ouch. Unfortunately adding a __del__() method that calls close() > won't really help, as the cyclic GC refuses to do anything with > objects having a __del__. This needs more thinking than I have time > for right now, but i agree we need to fix it. But if we remove SocketCloser, there's no need for the cyclic GC to be involved. If the count (of the number of outstanding SocketIO instances pointing to this socket.socket) is just moved into the socket.socket object itself, there's no cyclic reference, and normal refcounting should work just fine. I don't even think a __del__ method on socket.socket is necessary. > > Why not move the count of how many SocketIO instances are pointing to > > it into the socket.socket class again, as it was in 2.x? I don't > > think you're gaining anything with the circular data structure of > > SocketCloser. Add a "_closed" property, and "__del__" method to > > socket.socket (which calls "close()"). Remove SocketCloser. You're > > finished, and there's one less class to maintain. > > I'll look into this later. OK. Bill From brett at python.org Tue Oct 30 19:05:23 2007 From: brett at python.org (Brett Cannon) Date: Tue, 30 Oct 2007 11:05:23 -0700 Subject: [Python-3000] plat-mac seriously broken? In-Reply-To: References: <18211.33450.332197.304601@montanaro.dyndns.org> <1209807056282906541@unknownmsgid> Message-ID: On 10/29/07, Guido van Rossum wrote: > 2007/10/27, Bill Janssen : > > > ISTR much of the plat-mac stuff was generated by Tools/bgen. If so, that > > > would be the place to fix things. > > > > Sure looks like generated code. Be nice if that generator was run > > during the build process, on OS X. That way you'd be sure to get code > > that matches the platform and codebase. > > ISTR that the generator needs a lot of hand-holding. Fixing it would > be A Project. Just so that it is publicly known, when the Great Stdlib Reorg begins, I am seriously thinking of paring down the Mac stuff to the bare minimum. I think the only reason all the Mac stuff was even allowed in to begin with was because Jack was one of the first contributors to Python (but that is just a hunch). It seems rather unfair to have all of this Mac stuff in the stdlib while Windows doesn't go far beyond _winreg and everything else is kept in win32all. Considering it has gone this far into Py3K and no one has noticed that it was broken kind of says something anyway. And no, I don't know when I am going to start doing the cleanup as I am under time pressure for three proposals between now and late December. -Brett From guido at python.org Tue Oct 30 19:39:17 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Oct 2007 11:39:17 -0700 Subject: [Python-3000] plat-mac seriously broken? In-Reply-To: References: <18211.33450.332197.304601@montanaro.dyndns.org> <1209807056282906541@unknownmsgid> Message-ID: Also, IMO the Mac-specific stuff was a lot more important before OSX. The really interesting Mac stuff is the ObjC bridge which is not maintained here anyway. --Guido 2007/10/30, Brett Cannon : > On 10/29/07, Guido van Rossum wrote: > > 2007/10/27, Bill Janssen : > > > > ISTR much of the plat-mac stuff was generated by Tools/bgen. If so, that > > > > would be the place to fix things. > > > > > > Sure looks like generated code. Be nice if that generator was run > > > during the build process, on OS X. That way you'd be sure to get code > > > that matches the platform and codebase. > > > > ISTR that the generator needs a lot of hand-holding. Fixing it would > > be A Project. > > Just so that it is publicly known, when the Great Stdlib Reorg begins, > I am seriously thinking of paring down the Mac stuff to the bare > minimum. I think the only reason all the Mac stuff was even allowed > in to begin with was because Jack was one of the first contributors to > Python (but that is just a hunch). It seems rather unfair to have all > of this Mac stuff in the stdlib while Windows doesn't go far beyond > _winreg and everything else is kept in win32all. Considering it has > gone this far into Py3K and no one has noticed that it was broken kind > of says something anyway. > > And no, I don't know when I am going to start doing the cleanup as I > am under time pressure for three proposals between now and late > December. > > -Brett > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Tue Oct 30 20:49:21 2007 From: janssen at parc.com (Bill Janssen) Date: Tue, 30 Oct 2007 12:49:21 PDT Subject: [Python-3000] socket GC worries In-Reply-To: <07Oct30.093737pst."57996"@synergy1.parc.xerox.com> References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> <-5143302779702104898@unknownmsgid> <07Oct30.093737pst."57996"@synergy1.parc.xerox.com> Message-ID: <07Oct30.114923pst."57996"@synergy1.parc.xerox.com> > But if we remove SocketCloser, there's no need for the cyclic GC to be > involved. If the count (of the number of outstanding SocketIO > instances pointing to this socket.socket) is just moved into the > socket.socket object itself, there's no cyclic reference, and normal > refcounting should work just fine. I don't even think a __del__ method > on socket.socket is necessary. Here's a patch, for whenever you get back to this. You can ignore/remove the first hunk, which is about SSL. I've tried all the tests, and they work. I've looked for leaks in test_socket and test_ssl, no leaks. Bill Index: Lib/socket.py =================================================================== --- Lib/socket.py (revision 58714) +++ Lib/socket.py (working copy) @@ -21,7 +21,6 @@ htons(), htonl() -- convert 16, 32 bit int from host to network byte order inet_aton() -- convert IP addr string (123.45.67.89) to 32-bit packed format inet_ntoa() -- convert 32-bit packed format IP to string (123.45.67.89) -ssl() -- secure socket layer support (only available if configured) socket.getdefaulttimeout() -- get the default timeout value socket.setdefaulttimeout() -- set the default timeout value create_connection() -- connects to an address, with an optional timeout @@ -46,36 +45,6 @@ import _socket from _socket import * -try: - import _ssl - import ssl as _realssl -except ImportError: - # no SSL support - pass -else: - def ssl(sock, keyfile=None, certfile=None): - # we do an internal import here because the ssl - # module imports the socket module - warnings.warn("socket.ssl() is deprecated. Use ssl.wrap_socket() instead.", - DeprecationWarning, stacklevel=2) - return _realssl.sslwrap_simple(sock, keyfile, certfile) - - # we need to import the same constants we used to... - from _ssl import SSLError as sslerror - from _ssl import \ - RAND_add, \ - RAND_egd, \ - RAND_status, \ - SSL_ERROR_ZERO_RETURN, \ - SSL_ERROR_WANT_READ, \ - SSL_ERROR_WANT_WRITE, \ - SSL_ERROR_WANT_X509_LOOKUP, \ - SSL_ERROR_SYSCALL, \ - SSL_ERROR_SSL, \ - SSL_ERROR_WANT_CONNECT, \ - SSL_ERROR_EOF, \ - SSL_ERROR_INVALID_ERROR_CODE - import os, sys, io try: @@ -119,49 +88,11 @@ nfd = os.dup(fd) return socket(family, type, proto, fileno=nfd) -class SocketCloser: - - """Helper to manage socket close() logic for makefile(). - - The OS socket should not be closed until the socket and all - of its makefile-children are closed. If the refcount is zero - when socket.close() is called, this is easy: Just close the - socket. If the refcount is non-zero when socket.close() is - called, then the real close should not occur until the last - makefile-child is closed. - """ - - def __init__(self, sock): - self._sock = sock - self._makefile_refs = 0 - # Test whether the socket is open. - try: - sock.fileno() - self._socket_open = True - except error: - self._socket_open = False - - def socket_close(self): - self._socket_open = False - self.close() - - def makefile_open(self): - self._makefile_refs += 1 - - def makefile_close(self): - self._makefile_refs -= 1 - self.close() - - def close(self): - if not (self._socket_open or self._makefile_refs): - self._sock._real_close() - - class socket(_socket.socket): """A subclass of _socket.socket adding the makefile() method.""" - __slots__ = ["__weakref__", "_closer"] + __slots__ = ["__weakref__", "_io_refs", "_closed"] if not _can_dup_socket: __slots__.append("_base") @@ -170,16 +101,17 @@ _socket.socket.__init__(self, family, type, proto) else: _socket.socket.__init__(self, family, type, proto, fileno) - # Defer creating a SocketCloser until makefile() is actually called. - self._closer = None + self._io_refs = 0 + self._closed = False def __repr__(self): """Wrap __repr__() to reveal the real class name.""" s = _socket.socket.__repr__(self) if s.startswith(" 0: + self._io_refs -= 1 + if self._closed: + self.close() + def makefile(self, mode="r", buffering=None, *, encoding=None, newline=None): """Return an I/O stream connected to the socket. @@ -216,9 +154,8 @@ rawmode += "r" if writing: rawmode += "w" - if self._closer is None: - self._closer = SocketCloser(self) - raw = SocketIO(self, rawmode, self._closer) + raw = SocketIO(self, rawmode) + self._io_refs += 1 if buffering is None: buffering = -1 if buffering < 0: @@ -246,10 +183,9 @@ return text def close(self): - if self._closer is None: + self._closed = True + if self._io_refs < 1: self._real_close() - else: - self._closer.socket_close() # _real_close calls close on the _socket.socket base class. @@ -275,16 +211,14 @@ # XXX More docs - def __init__(self, sock, mode, closer): + def __init__(self, sock, mode): if mode not in ("r", "w", "rw"): raise ValueError("invalid mode: %r" % mode) io.RawIOBase.__init__(self) self._sock = sock self._mode = mode - self._closer = closer self._reading = "r" in mode self._writing = "w" in mode - closer.makefile_open() def readinto(self, b): self._checkClosed() @@ -308,10 +242,12 @@ def close(self): if self.closed: return - self._closer.makefile_close() io.RawIOBase.close(self) + def __del__(self): + self._sock.decref_socketios() + def getfqdn(name=''): """Get fully qualified domain name from name. From janssen at parc.com Tue Oct 30 20:52:42 2007 From: janssen at parc.com (Bill Janssen) Date: Tue, 30 Oct 2007 12:52:42 PDT Subject: [Python-3000] plat-mac seriously broken? In-Reply-To: References: <18211.33450.332197.304601@montanaro.dyndns.org> <1209807056282906541@unknownmsgid> Message-ID: <07Oct30.115248pst."57996"@synergy1.parc.xerox.com> > Also, IMO the Mac-specific stuff was a lot more important before OSX. > > The really interesting Mac stuff is the ObjC bridge which is not > maintained here anyway. I'm not so sure about that. The IC module, for instance, plugs into the Internet Config on the Mac, so you can read things like proxy settings when making an HTTP or FTP connection. To make Python as useful on the Mac as it currently is, you'd have to refit a lot of that in PyObjC, and bundle PyObjC into Python, wouldn't you? And I haven't seen a lot of volunteers on the MacPython mailing list raring to contribute to this. Bill From greg.ewing at canterbury.ac.nz Tue Oct 30 21:42:35 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Oct 2007 09:42:35 +1300 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47169E6E.7000804@canterbury.ac.nz> <47267607.2000806@canterbury.ac.nz> Message-ID: <4727973B.3060203@canterbury.ac.nz> Adam Olsen wrote: > It'd be simpler still if we only had __cmp__ and __eq__. I just don't > understand the use cases where that's not sufficient. > > Hrm. I guess set's subset checking requires more relationships than > __cmp__ provides. Also, you might want to give the comparison operators meanings that don't have anything to do with comparison in the usual sense. The reason tp_richcmp was added in the first place was so that arbitrary meanings could be given to the comparison operators individually. -- Greg From rhamph at gmail.com Tue Oct 30 22:36:57 2007 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 30 Oct 2007 15:36:57 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: <4727973B.3060203@canterbury.ac.nz> References: <47169E6E.7000804@canterbury.ac.nz> <47267607.2000806@canterbury.ac.nz> <4727973B.3060203@canterbury.ac.nz> Message-ID: On 10/30/07, Greg Ewing wrote: > Adam Olsen wrote: > > It'd be simpler still if we only had __cmp__ and __eq__. I just don't > > understand the use cases where that's not sufficient. > > > > Hrm. I guess set's subset checking requires more relationships than > > __cmp__ provides. > > Also, you might want to give the comparison operators meanings > that don't have anything to do with comparison in the usual > sense. The reason tp_richcmp was added in the first place was > so that arbitrary meanings could be given to the comparison > operators individually. Yeah. It's clear to me that the opposition to removing __cmp__ comes down to "make the common things easy and the rare things possible". Removing __cmp__ means one of the common things (total ordering) becomes hard. __richcmp__ might solve that, but I'd like to see some larger examples first (involving unordered types, total ordered types, and partially ordered types.) -- Adam Olsen, aka Rhamphoryncus From steven.bethard at gmail.com Wed Oct 31 01:17:17 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Tue, 30 Oct 2007 18:17:17 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47169E6E.7000804@canterbury.ac.nz> <47267607.2000806@canterbury.ac.nz> <4727973B.3060203@canterbury.ac.nz> Message-ID: On 10/30/07, Adam Olsen wrote: > It's clear to me that the opposition to removing __cmp__ comes down to > "make the common things easy and the rare things possible". Removing > __cmp__ means one of the common things (total ordering) becomes hard. I don't really think that's it. I don't see much of a difference in difficulty between writing:: class C(TotalOrderingMixin): def __lt__(self, other): self.foo < other.foo def __eq__(self, other): self.foo == other.foo or writing [1] :: class C(object): def __cmp__(self, other): if self.foo < other.foo: return -1 elif self.foo < other.foo: return 1 else: return 0 The main motivation seems really to be efficiency for a particular task. For some tasks, e.g. sorting, you really only need __lt__, so going through __cmp__ will just be slower. For other tasks, e.g. comparing objects with several components, you know you have to do both the __lt__ and __eq__ comparisons, so it would be wasteful to make two calls when you know you could do it in one through __cmp__. So it's not really about making things easier or harder, it's about making the most efficient tool for the task available. Steve [1] Yes, of course, you could just write cmp(self.foo, other.foo), but this is how it's been written in the rest of the thread, so I have to assume that it's more representative of real code. -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From rhamph at gmail.com Wed Oct 31 02:57:17 2007 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 30 Oct 2007 19:57:17 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47169E6E.7000804@canterbury.ac.nz> <47267607.2000806@canterbury.ac.nz> <4727973B.3060203@canterbury.ac.nz> Message-ID: On 10/30/07, Steven Bethard wrote: > On 10/30/07, Adam Olsen wrote: > > It's clear to me that the opposition to removing __cmp__ comes down to > > "make the common things easy and the rare things possible". Removing > > __cmp__ means one of the common things (total ordering) becomes hard. > > I don't really think that's it. I don't see much of a difference in > difficulty between writing:: > > class C(TotalOrderingMixin): > def __lt__(self, other): > self.foo < other.foo > def __eq__(self, other): > self.foo == other.foo > > or writing [1] :: > > class C(object): > def __cmp__(self, other): > if self.foo < other.foo: > return -1 > elif self.foo < other.foo: > return 1 > else: > return 0 > > The main motivation seems really to be efficiency for a particular > task. For some tasks, e.g. sorting, you really only need __lt__, so > going through __cmp__ will just be slower. For other tasks, e.g. > comparing objects with several components, you know you have to do > both the __lt__ and __eq__ comparisons, so it would be wasteful to > make two calls when you know you could do it in one through __cmp__. > > So it's not really about making things easier or harder, it's about > making the most efficient tool for the task available. > > Steve > > [1] Yes, of course, you could just write cmp(self.foo, other.foo), but > this is how it's been written in the rest of the thread, so I have to > assume that it's more representative of real code. cmp and __cmp__ are doomed, due to unorderable types now raising exceptions: >>> cmp(3, 'hello') Traceback (most recent call last): File "", line 1, in TypeError: unorderable types: int() < str() >>> 3 == 'hello' False A mixin for __cmp__ would be sufficient for scalars (where you can avoid this exception and your size is constant), but not for containers (which need to avoid inappropriate types and wish to avoid multiple passes.) I don't think __richcmp__ makes the process quite as simple as we want though: class C(RichCmpMixin): def __richcmp__(self, other, mode): if not isinstance(other, C): return NotImplemented for a, b in zip(self.data, other.data): result = richcmp(a, b, mode) # XXX how do I know when to stop if all I'm doing is a # <= comparison? cmp() is much easier! return richcmp(len(self.data), len(other.data), mode) If you standardize the meaning of the return values, rather than changing meaning based upon arguments, the whole thing works much better. A simple ordered flag indicates the extent of your comparison. Returning a false value always means equal, while returning a true value means unequal possibly with a specific ordering. class C: def __richcmp__(self, other, ordered): if not isinstance(other, C): return NotImplemented for a, b in zip(self.data, other.data): result = richcmp(a, b, ordered) if result: return result return richcmp(len(self.data), len(other.data), ordered) It also occurs to me that, if a type doesn't use symmetric comparisons, it should raise an exception rather than silently doing the wrong thing. To do that you need to know explicitly when ordering is being done (which richcmp/__richcmp__ does.) -- Adam Olsen, aka Rhamphoryncus From steven.bethard at gmail.com Wed Oct 31 03:11:47 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Tue, 30 Oct 2007 20:11:47 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47169E6E.7000804@canterbury.ac.nz> <47267607.2000806@canterbury.ac.nz> <4727973B.3060203@canterbury.ac.nz> Message-ID: On 10/30/07, Adam Olsen wrote: > cmp and __cmp__ are doomed, due to unorderable types now raising exceptions: > > >>> cmp(3, 'hello') > Traceback (most recent call last): > File "", line 1, in > TypeError: unorderable types: int() < str() > >>> 3 == 'hello' > False > > A mixin for __cmp__ would be sufficient for scalars (where you can > avoid this exception and your size is constant), but not for > containers (which need to avoid inappropriate types and wish to avoid > multiple passes.) I don't understand this conclusion. If you start comparing things that are unorderable, you'll get an exception. But cmp() still makes sense when you compare other things:: >>> cmp((1, 'a', 4.5), (1, 'a', 6.2)) -1 >>> cmp([6, 5, 4], [6, 4, 5]) 1 I definitely don't want any cmp/__cmp__ implementation that swallows exceptions when the types don't align, e.g.:: >>> cmp((1, 'a'), ('a', 1)) Traceback (most recent call last): File "", line 1, in TypeError: unorderable types: int() < str() STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From rhamph at gmail.com Wed Oct 31 03:29:09 2007 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 30 Oct 2007 20:29:09 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47267607.2000806@canterbury.ac.nz> <4727973B.3060203@canterbury.ac.nz> Message-ID: On 10/30/07, Steven Bethard wrote: > On 10/30/07, Adam Olsen wrote: > > cmp and __cmp__ are doomed, due to unorderable types now raising exceptions: > > > > >>> cmp(3, 'hello') > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: unorderable types: int() < str() > > >>> 3 == 'hello' > > False > > > > A mixin for __cmp__ would be sufficient for scalars (where you can > > avoid this exception and your size is constant), but not for > > containers (which need to avoid inappropriate types and wish to avoid > > multiple passes.) > > I don't understand this conclusion. If you start comparing things > that are unorderable, you'll get an exception. But cmp() still makes > sense when you compare other things:: > > >>> cmp((1, 'a', 4.5), (1, 'a', 6.2)) > -1 > >>> cmp([6, 5, 4], [6, 4, 5]) > 1 > > I definitely don't want any cmp/__cmp__ implementation that swallows > exceptions when the types don't align, e.g.:: > > >>> cmp((1, 'a'), ('a', 1)) > Traceback (most recent call last): > File "", line 1, in > TypeError: unorderable types: int() < str() What I meant is that you can't use a mixin to map __eq__ to __cmp__, as you'll get TypeError even though == is defined for those types. -- Adam Olsen, aka Rhamphoryncus From steven.bethard at gmail.com Wed Oct 31 03:56:33 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Tue, 30 Oct 2007 20:56:33 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47267607.2000806@canterbury.ac.nz> <4727973B.3060203@canterbury.ac.nz> Message-ID: On 10/30/07, Adam Olsen wrote: > On 10/30/07, Steven Bethard wrote: > > On 10/30/07, Adam Olsen wrote: > > > cmp and __cmp__ are doomed, due to unorderable types now raising exceptions: > > > > > > >>> cmp(3, 'hello') > > > Traceback (most recent call last): > > > File "", line 1, in > > > TypeError: unorderable types: int() < str() > > > >>> 3 == 'hello' > > > False > > > > > > A mixin for __cmp__ would be sufficient for scalars (where you can > > > avoid this exception and your size is constant), but not for > > > containers (which need to avoid inappropriate types and wish to avoid > > > multiple passes.) > > > > I don't understand this conclusion. If you start comparing things > > that are unorderable, you'll get an exception. But cmp() still makes > > sense when you compare other things:: > > > > >>> cmp((1, 'a', 4.5), (1, 'a', 6.2)) > > -1 > > >>> cmp([6, 5, 4], [6, 4, 5]) > > 1 > > > > I definitely don't want any cmp/__cmp__ implementation that swallows > > exceptions when the types don't align, e.g.:: > > > > >>> cmp((1, 'a'), ('a', 1)) > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: unorderable types: int() < str() > > What I meant is that you can't use a mixin to map __eq__ to __cmp__, > as you'll get TypeError even though == is defined for those types. I wasn't suggesting that, though I don't see why a mixin would fail here assuming you have both __eq__ and __lt__. Just to the __lt__ comparison first. I'm actually currently in favor of keeping __cmp__ as it is in Python 2.5. If a class defines only __cmp__, Python will do the appropriate dance to make <, >, ==, etc. work right. If a class defines only __eq__, __lt__, etc. Python will do the appropriate dance to make cmp() work right. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From rhamph at gmail.com Wed Oct 31 04:13:03 2007 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 30 Oct 2007 21:13:03 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47267607.2000806@canterbury.ac.nz> <4727973B.3060203@canterbury.ac.nz> Message-ID: On 10/30/07, Steven Bethard wrote: > On 10/30/07, Adam Olsen wrote: > > On 10/30/07, Steven Bethard wrote: > > > On 10/30/07, Adam Olsen wrote: > > > > cmp and __cmp__ are doomed, due to unorderable types now raising exceptions: > > > > > > > > >>> cmp(3, 'hello') > > > > Traceback (most recent call last): > > > > File "", line 1, in > > > > TypeError: unorderable types: int() < str() > > > > >>> 3 == 'hello' > > > > False > > > > > > > > A mixin for __cmp__ would be sufficient for scalars (where you can > > > > avoid this exception and your size is constant), but not for > > > > containers (which need to avoid inappropriate types and wish to avoid > > > > multiple passes.) > > > > > > I don't understand this conclusion. If you start comparing things > > > that are unorderable, you'll get an exception. But cmp() still makes > > > sense when you compare other things:: > > > > > > >>> cmp((1, 'a', 4.5), (1, 'a', 6.2)) > > > -1 > > > >>> cmp([6, 5, 4], [6, 4, 5]) > > > 1 > > > > > > I definitely don't want any cmp/__cmp__ implementation that swallows > > > exceptions when the types don't align, e.g.:: > > > > > > >>> cmp((1, 'a'), ('a', 1)) > > > Traceback (most recent call last): > > > File "", line 1, in > > > TypeError: unorderable types: int() < str() > > > > What I meant is that you can't use a mixin to map __eq__ to __cmp__, > > as you'll get TypeError even though == is defined for those types. > > I wasn't suggesting that, though I don't see why a mixin would fail > here assuming you have both __eq__ and __lt__. Just to the __lt__ > comparison first. > > I'm actually currently in favor of keeping __cmp__ as it is in Python > 2.5. If a class defines only __cmp__, Python will do the appropriate > dance to make <, >, ==, etc. work right. If a class defines only > __eq__, __lt__, etc. Python will do the appropriate dance to make > cmp() work right. For some definition of "right". A container defines only __cmp__, using cmp() internally, will be broken in 3.0. -- Adam Olsen, aka Rhamphoryncus From steven.bethard at gmail.com Wed Oct 31 04:22:08 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Tue, 30 Oct 2007 21:22:08 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <4727973B.3060203@canterbury.ac.nz> Message-ID: On 10/30/07, Adam Olsen wrote: > > I'm actually currently in favor of keeping __cmp__ as it is in Python > > 2.5. If a class defines only __cmp__, Python will do the appropriate > > dance to make <, >, ==, etc. work right. If a class defines only > > __eq__, __lt__, etc. Python will do the appropriate dance to make > > cmp() work right. > > For some definition of "right". A container defines only __cmp__, > using cmp() internally, will be broken in 3.0. Sure, but that's their choice. If you don't want to raise exceptions on equality comparisons, then you should define __eq__, in addition to __cmp__. Or you should only compare against comparable things. Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From greg.ewing at canterbury.ac.nz Wed Oct 31 05:46:16 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Oct 2007 17:46:16 +1300 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47169E6E.7000804@canterbury.ac.nz> <47267607.2000806@canterbury.ac.nz> <4727973B.3060203@canterbury.ac.nz> Message-ID: <47280898.7010603@canterbury.ac.nz> Steven Bethard wrote: > class C(object): > def __cmp__(self, other): > if self.foo < other.foo: > return -1 > elif self.foo < other.foo: > return 1 > else: > return 0 With __cmp__, in cases like that you can punt the whole thing off to the subsidiary object, e.g. def __cmp__(self, other): return cmp(self.foo, other.foo) -- Greg From greg.ewing at canterbury.ac.nz Wed Oct 31 06:13:36 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Oct 2007 18:13:36 +1300 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47169E6E.7000804@canterbury.ac.nz> <47267607.2000806@canterbury.ac.nz> <4727973B.3060203@canterbury.ac.nz> Message-ID: <47280F00.1050002@canterbury.ac.nz> Adam Olsen wrote: > for a, b in zip(self.data, other.data): > result = richcmp(a, b, ordered) > if result: > return result That can't be right, because there are *three* possible results you need to be able to distinguish from comparing a pair of elements: "stop and return True", "stop and return False", and "keep going". There's no way you can get that out of a boolean return value. Maybe what we need to do is enhance __cmp__ so that it has *four* possible return values: -1, 0, 1 and UnequalButNotOrdered. The scheme for handling a comparison 'op' between two values 'a' and 'b' would then be: 1) Try a.__richcmp__(op, b) and vice versa. If either of these produces a result, return it. 2) Try a.__cmp__(b) and vice versa. If either of these produces a result, then a) If the result is -1, 0 or 1, return an appropriate value based on the operation. b) If the result is UnequalButNotOrdered, and the operation is == or !=, return an appropriate value. c) Otherwise, raise an exception. The pattern for comparing sequences would become: def __cmp__(self, other): for a, b in zip(self.items, other.items): result = cmp(a, b) if result != 0: return result return 0 Which is actually the same as it is now, with an added bit of It Just Works behaviour: if any of the element comparisons gives UnequalButNotOrdered, then the whole sequence gets reported as such. -- Greg From greg.ewing at canterbury.ac.nz Wed Oct 31 06:22:06 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Oct 2007 18:22:06 +1300 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: References: <47267607.2000806@canterbury.ac.nz> <4727973B.3060203@canterbury.ac.nz> Message-ID: <472810FE.7060805@canterbury.ac.nz> Steven Bethard wrote: > If a class defines only __cmp__, Python will do the appropriate > dance to make <, >, ==, etc. work right. If a class defines only > __eq__, __lt__, etc. Python will do the appropriate dance to make > cmp() work right. With a four-way __cmp__, I wouldn't actually mind if the dance only worked one way, i.e. richcmp --> cmp. In that world, the only reason to define separate comparison operators would be if you were using them for something radically different from normal comparison. So defining __cmp__ could be defined as the standard way to implement comparison operators unless there's some reason you really can't do it that way, in which case you just have to live with cmp() not working on your type. -- Greg From rhamph at gmail.com Wed Oct 31 06:53:02 2007 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 30 Oct 2007 23:53:02 -0600 Subject: [Python-3000] Please re-add __cmp__ to python 3000 In-Reply-To: <47280F00.1050002@canterbury.ac.nz> References: <47267607.2000806@canterbury.ac.nz> <4727973B.3060203@canterbury.ac.nz> <47280F00.1050002@canterbury.ac.nz> Message-ID: On 10/30/07, Greg Ewing wrote: > Adam Olsen wrote: > > for a, b in zip(self.data, other.data): > > result = richcmp(a, b, ordered) > > if result: > > return result > > That can't be right, because there are *three* possible > results you need to be able to distinguish from comparing > a pair of elements: "stop and return True", "stop and > return False", and "keep going". There's no way you can > get that out of a boolean return value. It's not strictly a boolean value. If ordered is false then you interpret it as either a false value or a true value (but it may return -1 or +1 for the true values.) If ordered is true then it may be -1, 0/false, +1, or raise a TypeError if ordering is unsupported. > def __cmp__(self, other): > for a, b in zip(self.items, other.items): > result = cmp(a, b) > if result != 0: > return result > return 0 > > Which is actually the same as it is now, with an added > bit of It Just Works behaviour: if any of the element > comparisons gives UnequalButNotOrdered, then the whole > sequence gets reported as such. So the difference between our two approaches is that mine uses a flag to indicate if a TypeError should be raised, while yours adds an extra return value. Mine does have a small benefit: list currently exits early if it's only testing for equality and the lengths differ, which couldn't be done with your API. -- Adam Olsen, aka Rhamphoryncus From nnorwitz at gmail.com Wed Oct 31 08:13:46 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Wed, 31 Oct 2007 00:13:46 -0700 Subject: [Python-3000] status of buildbots Message-ID: We've made a lot of progress with the tests. Several buildbots are green. http://python.org/dev/buildbot/3.0/ There are some tests that are unstable, at least: test_asynchat test_urllib2net test_xmlrpc http://python.org/dev/buildbot/3.0/g4%20osx.4%203.0/builds/170/step-test/0 http://python.org/dev/buildbot/3.0/MIPS%20Debian%203.0/builds/81/step-test/0 http://python.org/dev/buildbot/3.0/x86%20FreeBSD%203.0/builds/126/step-test/0 I would really like to get these flaky tests fixed so they don't create false positives. It will help us greatly to move forward. I think these failures can occur on all platforms, so nothing special is required and it should be just fixing the test (all python code). Other platform specific problems: Windows has more problems, with these tests failing: test_csv test_dumbdbm test_gettext test_mailbox test_netrc test_pep277 test_subprocess http://python.org/dev/buildbot/3.0/x86%20XP%203.0/builds/190/step-test/0 Win64 has a few more: test_csv test_dumbdbm test_fileinput test_format test_getargs2 test_gettext test_mailbox test_netrc test_pep277 test_subprocess test_winsound http://python.org/dev/buildbot/3.0/amd64%20XP%203.0/builds/183/step-test/0 (This link is old. There were other problems with the bot.) There might be patches for one or more of these problems, but I'm not sure if they work. n From theller at ctypes.org Wed Oct 31 13:57:08 2007 From: theller at ctypes.org (Thomas Heller) Date: Wed, 31 Oct 2007 13:57:08 +0100 Subject: [Python-3000] status of buildbots In-Reply-To: References: Message-ID: Neal Norwitz schrieb: > Other platform specific problems: > > Win64 has a few more: > test_csv test_dumbdbm test_fileinput test_format test_getargs2 > test_gettext test_mailbox test_netrc test_pep277 test_subprocess > test_winsound > http://python.org/dev/buildbot/3.0/amd64%20XP%203.0/builds/183/step-test/0 > (This link is old. There were other problems with the bot.) Please ignore the test_winsound result on Win64. They are caused by the machine, and I do not know how to disable the test (Martins advice to remove the sound-driver did not help, unfortunately). AFAICT, the test_winsound succeeds on Win64 if I'm logged in with a remote desktop connection to this machine, but I cannot stand the sudden beeping when the tests run ;-). Thomas From adam at hupp.org Wed Oct 31 14:30:55 2007 From: adam at hupp.org (Adam Hupp) Date: Wed, 31 Oct 2007 09:30:55 -0400 Subject: [Python-3000] status of buildbots In-Reply-To: References: Message-ID: <766a29bd0710310630g2fbc7131nfc23275f9dbf7bfa@mail.gmail.com> On 10/31/07, Neal Norwitz wrote: > We've made a lot of progress with the tests. Several buildbots are > green. http://python.org/dev/buildbot/3.0/ > > There are some tests that are unstable, at least: > test_asynchat test_urllib2net test_xmlrpc > http://python.org/dev/buildbot/3.0/g4%20osx.4%203.0/builds/170/step-test/0 > http://python.org/dev/buildbot/3.0/x86%20FreeBSD%203.0/builds/126/step-test/0 test_xmlrpc has code to ignore these but the error message has changed slightly so it's no longer in effect. The reason for the errors is that the test is setting a timeout on the socket object which puts it in to non-blocking mode. That's incompatible with SocketServer which uses socket.makefile for IO. I don't think the timeout is necessary as long as one other fix is made. I've asked the author of the test for confirmation. On a related note, I think socket.makefile should throw an error if called on a non-blocking socket. The docs are pretty unambiguous that this is wrong: "file objects returned by the makefile() method must only be used when the socket is in blocking mode; in timeout or non-blocking mode file operations that cannot be completed immediately will fail." Throwing an error would prevent things like this CherryPy issue: http://www.cherrypy.org/ticket/598 This doesn't help if the socket is put into non-blocking mode after makefile is called but it's better than nothing. Alternatively, if the a timeout is set but non-blocking is *not* explicitly enabled the socket implementation could handle the retry loop itself. -- Adam Hupp | http://hupp.org/adam/ From lists at cheimes.de Wed Oct 31 16:19:06 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 31 Oct 2007 16:19:06 +0100 Subject: [Python-3000] status of buildbots In-Reply-To: References: Message-ID: <47289CEA.5060505@cheimes.de> Neal Norwitz wrote: > Windows has more problems, with these tests failing: > test_csv test_dumbdbm test_gettext test_mailbox test_netrc > test_pep277 test_subprocess > http://python.org/dev/buildbot/3.0/x86%20XP%203.0/builds/190/step-test/0 test_csv Changing TemporaryFile("w+") to TemporaryFile("w+", newline='') in test_csv.py readerAssertEqual() line 377 fixes the text. I'm not sure if it's the proper way to fix the issue. test_netrc Added newline='' to fp = open(temp_filename, mode) in test_netrc.py fixes the test. Same as test_csv. test_gettext Index: gettext.py =================================================================== --- gettext.py (revision 58729) +++ gettext.py (working copy) @@ -291,7 +291,7 @@ if mlen == 0: # Catalog description lastk = k = None - for b_item in tmsg.split(os.linesep.encode("ascii")): + for b_item in tmsg.split('\n'.encode("ascii")): item = str(b_item).strip() if not item: continue Index: test/test_gettext.py =================================================================== --- test/test_gettext.py (revision 58729) +++ test/test_gettext.py (working copy) @@ -332,6 +332,7 @@ def test_weird_metadata(self): info = self.t.info() + self.assertEqual(len(info), 9) self.assertEqual(info['last-translator'], 'John Doe \nJane Foobar ') test_pep277 The test fails because the code in _fileio:fileio_init doesn't set name from widename. On windows the variable widename contains the name as PyUNICODE and name stays empty but PyErr_SetFromErrnoWithFilename(PyExc_IOError, name) uses the name. test_subprocess It passes on my machine test_mailbox It suffers from the same problem with newlines as test_csv and test_netrc Christian From fumanchu at aminus.org Wed Oct 31 17:00:55 2007 From: fumanchu at aminus.org (Robert Brewer) Date: Wed, 31 Oct 2007 09:00:55 -0700 Subject: [Python-3000] status of buildbots In-Reply-To: <766a29bd0710310630g2fbc7131nfc23275f9dbf7bfa@mail.gmail.com> References: <766a29bd0710310630g2fbc7131nfc23275f9dbf7bfa@mail.gmail.com> Message-ID: Adam Hupp wrote: > On 10/31/07, Neal Norwitz wrote: > > We've made a lot of progress with the tests. Several buildbots are > > green. http://python.org/dev/buildbot/3.0/ > > > > There are some tests that are unstable, at least: > > test_asynchat test_urllib2net test_xmlrpc > > http://python.org/dev/buildbot/3.0/g4%20osx.4%203.0/builds/170/step- > test/0 > > > http://python.org/dev/buildbot/3.0/x86%20FreeBSD%203.0/builds/126/step- > test/0 > > test_xmlrpc has code to ignore these but the error message has changed > slightly so it's no longer in effect. > > The reason for the errors is that the test is setting a timeout on the > socket object which puts it in to non-blocking mode. That's > incompatible with SocketServer which uses socket.makefile for IO. I > don't think the timeout is necessary as long as one other fix is made. > I've asked the author of the test for confirmation. > > On a related note, I think socket.makefile should throw an error if > called on a non-blocking socket. The docs are pretty unambiguous that > this is wrong: > > "file objects returned by the makefile() method must only be used when > the socket is in blocking mode; in timeout or non-blocking mode file > operations that cannot be completed immediately will fail." > > Throwing an error would prevent things like this CherryPy issue: > > http://www.cherrypy.org/ticket/598 > > This doesn't help if the socket is put into non-blocking mode after > makefile is called but it's better than nothing. > > Alternatively, if the a timeout is set but non-blocking is *not* > explicitly enabled the socket implementation could handle the retry > loop itself. That's the route I would prefer, and is probably what we're going to end up doing for CherryPy (write our own makefile which retries). We prefer the timeout for various reasons, yet WSGI requires file-like objects. Robert Brewer fumanchu at aminus.org From lists at cheimes.de Wed Oct 31 17:29:23 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 31 Oct 2007 17:29:23 +0100 Subject: [Python-3000] status of buildbots In-Reply-To: <47289CEA.5060505@cheimes.de> References: <47289CEA.5060505@cheimes.de> Message-ID: <4728AD63.6030608@cheimes.de> Christian Heimes wrote: > Neal Norwitz wrote: >> Windows has more problems, with these tests failing: >> test_csv test_dumbdbm test_gettext test_mailbox test_netrc >> test_pep277 test_subprocess >> http://python.org/dev/buildbot/3.0/x86%20XP%203.0/builds/190/step-test/0 I forgot to mention that test_dumbdbm fails because the test replaces \r\n line endings with \r\r\n line endings on Windows. Christian From r.m.oudkerk at gmail.com Wed Oct 31 19:19:05 2007 From: r.m.oudkerk at gmail.com (roudkerk) Date: Wed, 31 Oct 2007 18:19:05 +0000 (UTC) Subject: [Python-3000] socket GC worries References: <07Oct28.111004pst.57996@synergy1.parc.xerox.com> <472505A7.108@canterbury.ac.nz> <6186646035112263762@unknownmsgid> <4726738B.2080106@canterbury.ac.nz> Message-ID: Guido van Rossum python.org> writes: > > 2007/10/29, Greg Ewing canterbury.ac.nz>: > > I don't see what's so difficult about this. Each file > > descriptor should be owned by exactly one object. If > > two objects need to share a fd, then you dup() it so > > that each one has its own fd. When the object is > > close()d or GCed, it closes its fd. > > On Windows you can't dup() a fd. > You can use os.dup() on an fd. But with sockets you must use DuplicateHandle() instead because socket.fileno() returns a handle not an fd. socket.py has this comment: # # These classes are used by the socket() defined on Windows and BeOS # platforms to provide a best-effort implementation of the cleanup # semantics needed when sockets can't be dup()ed. # # These are not actually used on other platforms. # I don't know whether BeOS still matters to anyone... I would just implement _socket.socket.dup() on Windows using DuplicateHandle(). Example of DuplicateHandle(): import ctypes, socket from _subprocess import * # send a message to a socket object 'conn' listener = socket.socket() listener.bind(('localhost', 0)) listener.listen(1) client = socket.socket() client.connect(listener.getsockname()) conn, addr = listener.accept() client.sendall('hello world') # duplicate handle handle = conn.fileno() duphandle = DuplicateHandle( GetCurrentProcess(), handle, GetCurrentProcess(), 0, False, DUPLICATE_SAME_ACCESS ).Detach() # use duplicate handle to read the message buffer = ctypes.c_buffer(20) ctypes.windll.ws2_32.recv(duphandle, buffer, 20, 0) print handle, duphandle, buffer.value BTW. On Windows can we please have a socket.fromfd() function (or maybe that should be socket.fromhandle()). From lists at cheimes.de Wed Oct 31 21:03:54 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 31 Oct 2007 21:03:54 +0100 Subject: [Python-3000] status of buildbots In-Reply-To: References: Message-ID: <4728DFAA.2040604@cheimes.de> Neal Norwitz wrote: > Windows has more problems, with these tests failing: > test_csv test_dumbdbm test_gettext test_mailbox test_netrc > test_pep277 test_subprocess > http://python.org/dev/buildbot/3.0/x86%20XP%203.0/builds/190/step-test/0 I've used my new developer privileges to check in some fixes. I'm down to three failing unit tests on my WinXP (SP2 i386 German, inside a VMWare sandbox). 3 tests failed: test_csv test_mailbox test_netrc All remaining failures are caused by newline madness. Christian