From victor.stinner at haypocalc.com Mon Oct 1 00:59:40 2007 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 1 Oct 2007 00:59:40 +0200 Subject: [Python-3000] Python, int/long and GMP In-Reply-To: <200709281858.29705.victor.stinner@haypocalc.com> References: <200709280429.39396.victor.stinner@haypocalc.com> <200709281858.29705.victor.stinner@haypocalc.com> Message-ID: <200710010059.41161.victor.stinner@haypocalc.com> Hi, I wrote another patch with two improvment: use small integer cache and use Python memory allocation functions. Now GMP overhead (pystones result) is only -2% and not -20% (previous patch). Since the patch is huge, I prefer to leave copy on my server: http://www.haypocalc.com/tmp/py3k-long_gmp-v2.patch Victor -- Victor Stinner http://hachoir.org/ From guido at python.org Mon Oct 1 01:14:07 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 30 Sep 2007 16:14:07 -0700 Subject: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytesand Mutable Buffer) In-Reply-To: References: <20070929142126.D61D23A4045@sparrow.telecommunity.com> <20070929151127.AE5203A4045@sparrow.telecommunity.com> <20070929155823.C552B3A4045@sparrow.telecommunity.com> Message-ID: I see no other solution to this thread than to revert the decision that comparing bytes and str raises TypeError. It may catch a trivial mistake or two, but the far from trivial, subtle issues it causes for more sophisticated code just aren't worth it. I'll add this to PEP 3137. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Oct 1 01:25:20 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 30 Sep 2007 16:25:20 -0700 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer Message-ID: Thanks all for the focused and helpful discussion on this PEP. Here's a new posting of the full text of the PEP as it now stands. Most of the changes since the first posting are fleshing out of some details; the decision to make the individual elements of bytes and buffer be ints; and the decision to change bytes/str and buffer/str comparisons again to just return False instead of raising TypeError. (I'm not favorable towards the proposal of c'x' style literals or changes to the I/O APIs to use different names for calls involving bytes instead of text. If you still disagree, please start a new thread with new subject line.) I plan to accept the PEP within a day or two barring major objections, and expect to start implementing soon after. --Guido PEP: 3137 Title: Immutable Bytes and Mutable Buffer Version: $Revision: 58290 $ Last-Modified: $Date: 2007-09-30 16:19:14 -0700 (Sun, 30 Sep 2007) $ Author: Guido van Rossum Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 26-Sep-2007 Python-Version: 3.0 Post-History: 26-Sep-2007, 30-Sep-2007 Introduction ============ After releasing Python 3.0a1 with a mutable bytes type, pressure mounted to add a way to represent immutable bytes. Gregory P. Smith proposed a patch that would allow making a bytes object temporarily immutable by requesting that the data be locked using the new buffer API from PEP 3118. This did not seem the right approach to me. Jeffrey Yasskin, with the help of Adam Hupp, then prepared a patch to make the bytes type immutable (by crudely removing all mutating APIs) and fix the fall-out in the test suite. This showed that there aren't all that many places that depend on the mutability of bytes, with the exception of code that builds up a return value from small pieces. Thinking through the consequences, and noticing that using the array module as an ersatz mutable bytes type is far from ideal, and recalling a proposal put forward earlier by Talin, I floated the suggestion to have both a mutable and an immutable bytes type. (This had been brought up before, but until seeing the evidence of Jeffrey's patch I wasn't open to the suggestion.) Moreover, a possible implementation strategy became clear: use the old PyString implementation, stripped down to remove locale support and implicit conversions to/from Unicode, for the immutable bytes type, and keep the new PyBytes implementation as the mutable bytes type. The ensuing discussion made it clear that the idea is welcome but needs to be specified more precisely. Hence this PEP. Advantages ========== One advantage of having an immutable bytes type is that code objects can use these. It also makes it possible to efficiently create hash tables using bytes for keys; this may be useful when parsing protocols like HTTP or SMTP which are based on bytes representing text. Porting code that manipulates binary data (or encoded text) in Python 2.x will be easier using the new design than using the original 3.0 design with mutable bytes; simply replace ``str`` with ``bytes`` and change '...' literals into b'...' literals. Naming ====== I propose the following type names at the Python level: - ``bytes`` is an immutable array of bytes (PyString) - ``buffer`` is a mutable array of bytes (PyBytes) - ``memoryview`` is a bytes view on another object (PyMemory) The old type named ``buffer`` is so similar to the new type ``memoryview``, introduce by PEP 3118, that it is redundant. The rest of this PEP doesn't discuss the functionality of ``memoryview``; it is just mentioned here to justify getting rid of the old ``buffer`` type so we can reuse its name for the mutable bytes type. While eventually it makes sense to change the C API names, this PEP maintains the old C API names, which should be familiar to all. Literal Notations ================= The b'...' notation introduced in Python 3.0a1 returns an immutable bytes object, whatever variation is used. To create a mutable bytes buffer object, use buffer(b'...') or buffer([...]). The latter may use a list of integers in range(256). Functionality ============= PEP 3118 Buffer API ------------------- Both bytes and buffer implement the PEP 3118 buffer API. The bytes type only implements read-only requests; the buffer type allows writable and data-locked requests as well. The element data type is always 'B' (i.e. unsigned byte). Constructors ------------ There are four forms of constructors, applicable to both bytes and buffer: - ``bytes()``, ``bytes()``, ``buffer()``, ``buffer()``: simple copying constructors, with the note that ``bytes()`` might return its (immutable) argument. - ``bytes(, [, ])``, ``buffer(, [, ])``: encode a text string. Note that the ``str.encode()`` method returns an *immutable* bytes object. The argument is mandatory; is optional. - ``bytes()``, ``buffer()``: construct a bytes or buffer object from anything implementing the PEP 3118 buffer API. - ``bytes()``, ``buffer()``: construct an immutable bytes or mutable buffer object from a stream of integers in range(256). - ``buffer()``: construct a zero-initialized buffer of a given length. Comparisons ----------- The bytes and buffer types are comparable with each other and orderable, so that e.g. b'abc' == buffer(b'abc') < b'abd'. Comparing either type to a str object for equality returns False regardless of the contents of either operand. Ordering comparisons with str raise TypeError. This is all conformant to the standard rules for comparison and ordering between objects of incompatible types. (**Note:** in Python 3.0a1, comparing a bytes instance with a str instance would raise TypeError, on the premise that this would catch the occasional mistake quicker, especially in code ported from Python 2.x. However, a long discussion on the python-3000 list pointed out so many problems with this that it is clearly a bad idea, to be rolled back in 3.0a2 regardless of the fate of the rest of this PEP.) Slicing ------- Slicing a bytes object returns a bytes object. Slicing a buffer object returns a buffer object. Slice assignment to a mutable buffer object accept anything that implements the PEP 3118 buffer API, or an iterable of integers in range(256). Indexing -------- Indexing bytes and buffer returns small ints (like the bytes type in 3.0a1, and like lists or array.array('B')). Assignment to an item of a mutable buffer object accepts an int in range(256). (To assign from a bytes sequence, use a slice assignment.) Str() and Repr() ---------------- The str() and repr() functions return the same thing for these objects. The repr() of a bytes object returns a b'...' style literal. The repr() of a buffer returns a string of the form "buffer(b'...')". Operators --------- The following operators are implemented by the bytes and buffer types, except where mentioned: - ``b1 + b2``: concatenation. With mixed bytes/buffer operands, the return type is that of the first argument (this seems arbitrary until you consider how ``+=`` works). - ``b1 += b2'': mutates b1 if it is a buffer object. - ``b * n``, ``n * b``: repetition; n must be an integer. - ``b *= n``: mutates b if it is a buffer object. - ``b1 in b2``, ``b1 not in b2``: substring test; b1 can be any object implementing the PEP 3118 buffer API. - ``i in b``, ``i not in b``: single-byte membership test; i must be an integer (if it is a length-1 bytes array, it is considered to be a substring test, with the same outcome). - ``len(b)``: the number of bytes. - ``hash(b)``: the hash value; only implemented by the bytes type. Note that the % operator is *not* implemented. It does not appear worth the complexity. Methods ------- The following methods are implemented by bytes as well as buffer, with similar semantics. They accept anything that implements the PEP 3118 buffer API for bytes arguments, and return the same type as the object whose method is called ("self"):: .capitalize(), .center(), .count(), .decode(), .endswith(), .expandtabs(), .find(), .index(), .isalnum(), .isalpha(), .isdigit(), .islower(), .isspace(), .istitle(), .isupper(), .join(), .ljust(), .lower(), .lstrip(), .partition(), .replace(), .rfind(), .rindex(), .rjust(), .rpartition(), .rsplit(), .rstrip(), .split(), .splitlines(), .startswith(), .strip(), .swapcase(), .title(), .translate(), .upper(), .zfill() This is exactly the set of methods present on the str type in Python 2.x, with the exclusion of .encode(). The signatures and semantics are the same too. However, whenever character classes like letter, whitespace, lower case are used, the ASCII definitions of these classes are used. (The Python 2.x str type uses the definitions from the current locale, settable through the locale module.) The .encode() method is left out because of the more strict definitions of encoding and decoding in Python 3000: encoding always takes a Unicode string and returns a bytes sequence, and decoding always takes a bytes sequence and returns a Unicode string. In addition, both types implement the class method ``.fromhex()``, which constructs an object from a string containing hexadecimal values (with or without spaces between the bytes). The buffer type implements these additional methods from the MutableSequence ABC (see PEP 3119): .extend(), .insert(), .append(), .reverse(), .pop(), .remove(). Bytes and the Str Type ---------------------- Like the bytes type in Python 3.0a1, and unlike the relationship between str and unicode in Python 2.x, any attempt to mix bytes (or buffer) objects and str objects without specifying an encoding will raise a TypeError exception. This is the case even for simply comparing a bytes or buffer object to a str object (even violating the general rule that comparing objects of different types for equality should just return False). Conversions between bytes or buffer objects and str objects must always be explicit, using an encoding. There are two equivalent APIs: ``str(b, [, ])`` is equivalent to ``b.decode([, ])``, and ``bytes(s, [, ])`` is equivalent to ``s.encode([, ])``. There is one exception: we can convert from bytes (or buffer) to str without specifying an encoding by writing ``str(b)``. This produces the same result as ``repr(b)``. This exception is necessary because of the general promise that *any* object can be printed, and printing is just a special case of conversion to str. There is however no promise that printing a bytes object interprets the individual bytes as characters (unlike in Python 2.x). The str type currently implements the PEP 3118 buffer API. While this is perhaps occasionally convenient, it is also potentially confusing, because the bytes accessed via the buffer API represent a platform-depending encoding: depending on the platform byte order and a compile-time configuration option, the encoding could be UTF-16-BE, UTF-16-LE, UTF-32-BE, or UTF-32-LE. Worse, a different implementation of the str type might completely change the bytes representation, e.g. to UTF-8, or even make it impossible to access the data as a contiguous array of bytes at all. Therefore, the PEP 3118 buffer API will be removed from the str type. Pickling -------- Left as an exercise for the reader. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Mon Oct 1 02:11:38 2007 From: brett at python.org (Brett Cannon) Date: Sun, 30 Sep 2007 17:11:38 -0700 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: Message-ID: +1 from me. -Brett On 9/30/07, Guido van Rossum wrote: > Thanks all for the focused and helpful discussion on this PEP. Here's > a new posting of the full text of the PEP as it now stands. Most of > the changes since the first posting are fleshing out of some details; > the decision to make the individual elements of bytes and buffer be > ints; and the decision to change bytes/str and buffer/str comparisons > again to just return False instead of raising TypeError. > > (I'm not favorable towards the proposal of c'x' style literals or > changes to the I/O APIs to use different names for calls involving > bytes instead of text. If you still disagree, please start a new > thread with new subject line.) > > I plan to accept the PEP within a day or two barring major objections, > and expect to start implementing soon after. > > --Guido > > PEP: 3137 > Title: Immutable Bytes and Mutable Buffer > Version: $Revision: 58290 $ > Last-Modified: $Date: 2007-09-30 16:19:14 -0700 (Sun, 30 Sep 2007) $ > Author: Guido van Rossum > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 26-Sep-2007 > Python-Version: 3.0 > Post-History: 26-Sep-2007, 30-Sep-2007 > > Introduction > ============ > > After releasing Python 3.0a1 with a mutable bytes type, pressure > mounted to add a way to represent immutable bytes. Gregory P. Smith > proposed a patch that would allow making a bytes object temporarily > immutable by requesting that the data be locked using the new buffer > API from PEP 3118. This did not seem the right approach to me. > > Jeffrey Yasskin, with the help of Adam Hupp, then prepared a patch to > make the bytes type immutable (by crudely removing all mutating APIs) > and fix the fall-out in the test suite. This showed that there aren't > all that many places that depend on the mutability of bytes, with the > exception of code that builds up a return value from small pieces. > > Thinking through the consequences, and noticing that using the array > module as an ersatz mutable bytes type is far from ideal, and > recalling a proposal put forward earlier by Talin, I floated the > suggestion to have both a mutable and an immutable bytes type. (This > had been brought up before, but until seeing the evidence of Jeffrey's > patch I wasn't open to the suggestion.) > > Moreover, a possible implementation strategy became clear: use the old > PyString implementation, stripped down to remove locale support and > implicit conversions to/from Unicode, for the immutable bytes type, > and keep the new PyBytes implementation as the mutable bytes type. > > The ensuing discussion made it clear that the idea is welcome but > needs to be specified more precisely. Hence this PEP. > > Advantages > ========== > > One advantage of having an immutable bytes type is that code objects > can use these. It also makes it possible to efficiently create hash > tables using bytes for keys; this may be useful when parsing protocols > like HTTP or SMTP which are based on bytes representing text. > > Porting code that manipulates binary data (or encoded text) in Python > 2.x will be easier using the new design than using the original 3.0 > design with mutable bytes; simply replace ``str`` with ``bytes`` and > change '...' literals into b'...' literals. > > Naming > ====== > > I propose the following type names at the Python level: > > - ``bytes`` is an immutable array of bytes (PyString) > > - ``buffer`` is a mutable array of bytes (PyBytes) > > - ``memoryview`` is a bytes view on another object (PyMemory) > > The old type named ``buffer`` is so similar to the new type > ``memoryview``, introduce by PEP 3118, that it is redundant. The rest > of this PEP doesn't discuss the functionality of ``memoryview``; it is > just mentioned here to justify getting rid of the old ``buffer`` type > so we can reuse its name for the mutable bytes type. > > While eventually it makes sense to change the C API names, this PEP > maintains the old C API names, which should be familiar to all. > > Literal Notations > ================= > > The b'...' notation introduced in Python 3.0a1 returns an immutable > bytes object, whatever variation is used. To create a mutable bytes > buffer object, use buffer(b'...') or buffer([...]). The latter may > use a list of integers in range(256). > > Functionality > ============= > > PEP 3118 Buffer API > ------------------- > > Both bytes and buffer implement the PEP 3118 buffer API. The bytes > type only implements read-only requests; the buffer type allows > writable and data-locked requests as well. The element data type is > always 'B' (i.e. unsigned byte). > > Constructors > ------------ > > There are four forms of constructors, applicable to both bytes and > buffer: > > - ``bytes()``, ``bytes()``, ``buffer()``, > ``buffer()``: simple copying constructors, with the note > that ``bytes()`` might return its (immutable) argument. > > - ``bytes(, [, ])``, ``buffer(, > [, ])``: encode a text string. Note that the > ``str.encode()`` method returns an *immutable* bytes object. > The argument is mandatory; is optional. > > - ``bytes()``, ``buffer()``: construct a > bytes or buffer object from anything implementing the PEP 3118 > buffer API. > > - ``bytes()``, ``buffer()``: > construct an immutable bytes or mutable buffer object from a > stream of integers in range(256). > > - ``buffer()``: construct a zero-initialized buffer of a given > length. > > Comparisons > ----------- > > The bytes and buffer types are comparable with each other and > orderable, so that e.g. b'abc' == buffer(b'abc') < b'abd'. > > Comparing either type to a str object for equality returns False > regardless of the contents of either operand. Ordering comparisons > with str raise TypeError. This is all conformant to the standard > rules for comparison and ordering between objects of incompatible > types. > > (**Note:** in Python 3.0a1, comparing a bytes instance with a str > instance would raise TypeError, on the premise that this would catch > the occasional mistake quicker, especially in code ported from Python > 2.x. However, a long discussion on the python-3000 list pointed out > so many problems with this that it is clearly a bad idea, to be rolled > back in 3.0a2 regardless of the fate of the rest of this PEP.) > > Slicing > ------- > > Slicing a bytes object returns a bytes object. Slicing a buffer > object returns a buffer object. > > Slice assignment to a mutable buffer object accept anything that > implements the PEP 3118 buffer API, or an iterable of integers in > range(256). > > Indexing > -------- > > Indexing bytes and buffer returns small ints (like the bytes type in > 3.0a1, and like lists or array.array('B')). > > Assignment to an item of a mutable buffer object accepts an int in > range(256). (To assign from a bytes sequence, use a slice > assignment.) > > Str() and Repr() > ---------------- > > The str() and repr() functions return the same thing for these > objects. The repr() of a bytes object returns a b'...' style literal. > The repr() of a buffer returns a string of the form "buffer(b'...')". > > Operators > --------- > > The following operators are implemented by the bytes and buffer types, > except where mentioned: > > - ``b1 + b2``: concatenation. With mixed bytes/buffer operands, > the return type is that of the first argument (this seems arbitrary > until you consider how ``+=`` works). > > - ``b1 += b2'': mutates b1 if it is a buffer object. > > - ``b * n``, ``n * b``: repetition; n must be an integer. > > - ``b *= n``: mutates b if it is a buffer object. > > - ``b1 in b2``, ``b1 not in b2``: substring test; b1 can be any > object implementing the PEP 3118 buffer API. > > - ``i in b``, ``i not in b``: single-byte membership test; i must > be an integer (if it is a length-1 bytes array, it is considered > to be a substring test, with the same outcome). > > - ``len(b)``: the number of bytes. > > - ``hash(b)``: the hash value; only implemented by the bytes type. > > Note that the % operator is *not* implemented. It does not appear > worth the complexity. > > Methods > ------- > > The following methods are implemented by bytes as well as buffer, with > similar semantics. They accept anything that implements the PEP 3118 > buffer API for bytes arguments, and return the same type as the object > whose method is called ("self"):: > > .capitalize(), .center(), .count(), .decode(), .endswith(), > .expandtabs(), .find(), .index(), .isalnum(), .isalpha(), .isdigit(), > .islower(), .isspace(), .istitle(), .isupper(), .join(), .ljust(), > .lower(), .lstrip(), .partition(), .replace(), .rfind(), .rindex(), > .rjust(), .rpartition(), .rsplit(), .rstrip(), .split(), > .splitlines(), .startswith(), .strip(), .swapcase(), .title(), > .translate(), .upper(), .zfill() > > This is exactly the set of methods present on the str type in Python > 2.x, with the exclusion of .encode(). The signatures and semantics > are the same too. However, whenever character classes like letter, > whitespace, lower case are used, the ASCII definitions of these > classes are used. (The Python 2.x str type uses the definitions from > the current locale, settable through the locale module.) The > .encode() method is left out because of the more strict definitions of > encoding and decoding in Python 3000: encoding always takes a Unicode > string and returns a bytes sequence, and decoding always takes a bytes > sequence and returns a Unicode string. > > In addition, both types implement the class method ``.fromhex()``, > which constructs an object from a string containing hexadecimal values > (with or without spaces between the bytes). > > The buffer type implements these additional methods from the > MutableSequence ABC (see PEP 3119): > > .extend(), .insert(), .append(), .reverse(), .pop(), .remove(). > > Bytes and the Str Type > ---------------------- > > Like the bytes type in Python 3.0a1, and unlike the relationship > between str and unicode in Python 2.x, any attempt to mix bytes (or > buffer) objects and str objects without specifying an encoding will > raise a TypeError exception. This is the case even for simply > comparing a bytes or buffer object to a str object (even violating the > general rule that comparing objects of different types for equality > should just return False). > > Conversions between bytes or buffer objects and str objects must > always be explicit, using an encoding. There are two equivalent APIs: > ``str(b, [, ])`` is equivalent to > ``b.decode([, ])``, and > ``bytes(s, [, ])`` is equivalent to > ``s.encode([, ])``. > > There is one exception: we can convert from bytes (or buffer) to str > without specifying an encoding by writing ``str(b)``. This produces > the same result as ``repr(b)``. This exception is necessary because > of the general promise that *any* object can be printed, and printing > is just a special case of conversion to str. There is however no > promise that printing a bytes object interprets the individual bytes > as characters (unlike in Python 2.x). > > The str type currently implements the PEP 3118 buffer API. While this > is perhaps occasionally convenient, it is also potentially confusing, > because the bytes accessed via the buffer API represent a > platform-depending encoding: depending on the platform byte order and > a compile-time configuration option, the encoding could be UTF-16-BE, > UTF-16-LE, UTF-32-BE, or UTF-32-LE. Worse, a different implementation > of the str type might completely change the bytes representation, > e.g. to UTF-8, or even make it impossible to access the data as a > contiguous array of bytes at all. Therefore, the PEP 3118 buffer API > will be removed from the str type. > > Pickling > -------- > > Left as an exercise for the reader. > > Copyright > ========= > > This document has been placed in the public domain. > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/brett%40python.org > From aahz at pythoncraft.com Mon Oct 1 04:10:02 2007 From: aahz at pythoncraft.com (Aahz) Date: Sun, 30 Sep 2007 19:10:02 -0700 Subject: [Python-3000] Extension: mpf for GNU MP floating point In-Reply-To: References: Message-ID: <20071001021001.GA12746@panix.com> On Thu, Sep 27, 2007, Rob Crowther wrote: > > I've uploaded the latest code to http://umass.glexia.net/mpf.tar.bz2 > > Here's a quick rundown of supported functions and operations. Could you explain what your goal is here? MPF isn't currently part of the standard library, so it probably should exist as a standalone extension first. This mailing list is probably not the right place for discussion, either. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ The best way to get information on Usenet is not to ask a question, but to post the wrong information. From carsten at uniqsys.com Mon Oct 1 04:10:32 2007 From: carsten at uniqsys.com (Carsten Haese) Date: Sun, 30 Sep 2007 22:10:32 -0400 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: Message-ID: <1191204632.3258.6.camel@localhost.localdomain> On Sun, 2007-09-30 at 16:25 -0700, Guido van Rossum wrote: > [...] > (**Note:** in Python 3.0a1, comparing a bytes instance with a str > instance would raise TypeError, on the premise that this would catch > the occasional mistake quicker, especially in code ported from Python > 2.x. However, a long discussion on the python-3000 list pointed out > so many problems with this that it is clearly a bad idea, to be rolled > back in 3.0a2 regardless of the fate of the rest of this PEP.) > [...] > Like the bytes type in Python 3.0a1, and unlike the relationship > between str and unicode in Python 2.x, any attempt to mix bytes (or > buffer) objects and str objects without specifying an encoding will > raise a TypeError exception. This is the case even for simply > comparing a bytes or buffer object to a str object (even violating the > general rule that comparing objects of different types for equality > should just return False). It appears that you didn't revise the latter paragraph after adding the former paragraph. -- Carsten Haese http://informixdb.sourceforge.net From alexandre at peadrop.com Mon Oct 1 04:44:31 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 30 Sep 2007 22:44:31 -0400 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: Message-ID: On 9/30/07, Guido van Rossum wrote: > Pickling > -------- > > Left as an exercise for the reader. > A simple way to add specific pickling support for bytes/buffer objects would be to define two new constants: BYTES = b'\x8c' # push a bytes object BUFFER = b'\x8d' # push a buffer object And add the following pickling and unpickling procedures: def save_bytes(self, obj, pack=struct.pack): n = len(obj) self.write(BYTES + pack(" References: <1191204632.3258.6.camel@localhost.localdomain> Message-ID: On 9/30/07, Carsten Haese wrote: > On Sun, 2007-09-30 at 16:25 -0700, Guido van Rossum wrote: > > [...] > > (**Note:** in Python 3.0a1, comparing a bytes instance with a str > > instance would raise TypeError, on the premise that this would catch > > the occasional mistake quicker, especially in code ported from Python > > 2.x. However, a long discussion on the python-3000 list pointed out > > so many problems with this that it is clearly a bad idea, to be rolled > > back in 3.0a2 regardless of the fate of the rest of this PEP.) > > [...] > > Like the bytes type in Python 3.0a1, and unlike the relationship > > between str and unicode in Python 2.x, any attempt to mix bytes (or > > buffer) objects and str objects without specifying an encoding will > > raise a TypeError exception. This is the case even for simply > > comparing a bytes or buffer object to a str object (even violating the > > general rule that comparing objects of different types for equality > > should just return False). > > It appears that you didn't revise the latter paragraph after adding the > former paragraph. Good catch! Fixed in svn. The latter paragraph now reads: """ Like the bytes type in Python 3.0a1, and unlike the relationship between str and unicode in Python 2.x, attempts to mix bytes (or buffer) objects and str objects without specifying an encoding will raise a TypeError exception. (However, comparing bytes/buffer and str objects for equality will simply return False; see the section on Comparisons above.) """ -- --Guido van Rossum (home page: http://www.python.org/~guido/) From oliphant.travis at ieee.org Mon Oct 1 06:47:34 2007 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sun, 30 Sep 2007 23:47:34 -0500 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: Message-ID: +1 from me. I like that the str will not support the buffer API because it gets rid of one of the flags in the PEP 3118 API that was only there to support the abuse of the buffer API by unicode objects. - Travis Oliphant From greg at krypto.org Mon Oct 1 07:16:29 2007 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 30 Sep 2007 22:16:29 -0700 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: Message-ID: <52dc1c820709302216h34a82c45m2385f8dcf34de800@mail.gmail.com> +10 from me -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20070930/ee7c03f7/attachment.htm From ncoghlan at gmail.com Mon Oct 1 15:55:12 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 01 Oct 2007 23:55:12 +1000 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: Message-ID: <4700FC40.1060206@gmail.com> Brett Cannon wrote: > +1 from me. Looks good to me too: +1 I wouldn't mind seeing some iteration-in-C bit-bashing operations in there eventually, but they aren't needed on the first pass, and even being able to do things like the following will be a decent improvement over the status quo for low-level bitstream manipulation: data = bytes([x & 0x1F for x in orig_data]) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From dalcinl at gmail.com Mon Oct 1 17:00:11 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 1 Oct 2007 12:00:11 -0300 Subject: [Python-3000] [Python-Dev] building with -Wwrite-strings In-Reply-To: <20071001141007.GA20122@code0.codespeak.net> References: <46FD6DA2.1060107@v.loewis.de> <20071001141007.GA20122@code0.codespeak.net> Message-ID: Yes, you are completely right. I ended up realizing that a change like this would break almost all third-party extension. But... What about of doing this for Py3K? Third-party extension have to be fixed anyway. On 10/1/07, Armin Rigo wrote: > Hi Martin, > > On Fri, Sep 28, 2007 at 11:09:54PM +0200, "Martin v. L?wis" wrote: > > What's wrong with > > > > static const char *kwlist[] = {"x", "base", 0}; > > The following goes wrong if we try again to walk this path: > http://mail.python.org/pipermail/python-dev/2006-February/060689.html > > > Armin > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From skip at pobox.com Mon Oct 1 19:14:40 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 1 Oct 2007 12:14:40 -0500 Subject: [Python-3000] bytes vs. array.array vs. numpy.array In-Reply-To: <4700FC40.1060206@gmail.com> References: <4700FC40.1060206@gmail.com> Message-ID: <18177.11008.244338.509409@montanaro.dyndns.org> Nick> I wouldn't mind seeing some iteration-in-C bit-bashing operations Nick> in there eventually... Nick> data = bytes([x & 0x1F for x in orig_data]) This begins to make it look what you want is array.array or nump.array. Python's arrays don't support bitwise operations either, but numpy's do. How much overlap is there between the three types? Does it make sense to consider that canonical underlying array type now (or in the near future, sometime before the release of 3.0 final)? Skip From ncoghlan at gmail.com Mon Oct 1 23:18:19 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 02 Oct 2007 07:18:19 +1000 Subject: [Python-3000] bytes vs. array.array vs. numpy.array In-Reply-To: <18177.11008.244338.509409@montanaro.dyndns.org> References: <4700FC40.1060206@gmail.com> <18177.11008.244338.509409@montanaro.dyndns.org> Message-ID: <4701641B.4040501@gmail.com> skip at pobox.com wrote: > Nick> I wouldn't mind seeing some iteration-in-C bit-bashing operations > Nick> in there eventually... > > Nick> data = bytes([x & 0x1F for x in orig_data]) > > This begins to make it look what you want is array.array or nump.array. > Python's arrays don't support bitwise operations either, but numpy's do. > How much overlap is there between the three types? Does it make sense to > consider that canonical underlying array type now (or in the near future, > sometime before the release of 3.0 final)? Not hugely urgent for me - it's a direction I'd like to see the data type go in (as the less custom code needed on the C/C++ side of the fence to do reasonably efficient low level I/O the better as far as I am concerned), but work is still on 2.4 (with no compelling motivation to upgrade) so I'm personally resigned to the use of assorted ord(), chr() and ''.join() calls for the immediate future. The advantage of having the bit manipulation features in the builtin bytes type for this kind of thing over numpy.array is that I expect the builtin bytes type to be usable directly with Py3k versions of libraries like pyserial, and numpy would be a big dependency to bring in just to get more efficient bit-oriented operations on a byte sequence - array.array doesn't have them (not to mention the fact that these operations would make far less sense for any array containing something other than bytes). However, because the addition of any bit-oriented operations to the bytes/buffer types would be a new backwardly-compatible feature, it can be proposed whenever is convenient rather than having to be done right now. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From greg.ewing at canterbury.ac.nz Tue Oct 2 03:19:32 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 02 Oct 2007 14:19:32 +1300 Subject: [Python-3000] bytes vs. array.array vs. numpy.array In-Reply-To: <4701641B.4040501@gmail.com> References: <4700FC40.1060206@gmail.com> <18177.11008.244338.509409@montanaro.dyndns.org> <4701641B.4040501@gmail.com> Message-ID: <47019CA4.4010403@canterbury.ac.nz> Nick Coghlan wrote: > numpy would be a big dependency to bring in just to > get more efficient bit-oriented operations on a byte sequence Random thought - if long integers were to use byte sequences internally to hold their data, it might be possible to get this more or less for free in terms of code size. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From tjreedy at udel.edu Tue Oct 2 05:59:12 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 1 Oct 2007 23:59:12 -0400 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer References: <4700FC40.1060206@gmail.com> Message-ID: "Nick Coghlan" wrote in message news:4700FC40.1060206 at gmail.com... | Brett Cannon wrote: | > +1 from me. | | Looks good to me too: +1 | | I wouldn't mind seeing some iteration-in-C bit-bashing operations in | there eventually, but they aren't needed on the first pass, and even | being able to do things like the following will be a decent improvement | over the status quo for low-level bitstream manipulation: | | data = bytes([x & 0x1F for x in orig_data]) If orig_data were mutable (the new buffer, as proposed in the PEP), would not for i in range(len(orig_data)): orig_data[i] &= 0x1F do it in place? (I don't have .0a1 to try on the current bytes.) tjr From lists at cheimes.de Tue Oct 2 09:59:26 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 02 Oct 2007 09:59:26 +0200 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: <4700FC40.1060206@gmail.com> Message-ID: Terry Reedy wrote: > If orig_data were mutable (the new buffer, as proposed in the PEP), would > not > > for i in range(len(orig_data)): > orig_data[i] &= 0x1F > > do it in place? (I don't have .0a1 to try on the current bytes.) Good catch! Python 3.0a1 (py3k:58282, Sep 29 2007, 15:07:57) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 >>> orig_data = b"abc" >>> orig_data b'abc' >>> for i in range(len(orig_data)): ... orig_data[i] &= 0x1F ... >>> orig_data b'\x01\x02\x03' It'd be useful and more efficient if the new buffer type would support the bit wise operations directly: >>> orig_data &= 0x1F TypeError: unsupported operand type(s) for &=: 'bytes' and 'int' >>> orig_data &= b"\x1F" TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes' Christian From guido at python.org Tue Oct 2 16:10:04 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Oct 2007 07:10:04 -0700 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer In-Reply-To: References: <4700FC40.1060206@gmail.com> Message-ID: I am hereby accepting my own PEP 3137. The responses fell into three categories: enthusiastic +1s, textual corrections, and ideas for future enhancements. That's about as positive as it gets for any proposal. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From adam at hupp.org Tue Oct 2 16:37:22 2007 From: adam at hupp.org (Adam Hupp) Date: Tue, 2 Oct 2007 10:37:22 -0400 Subject: [Python-3000] Emacs22 python.el support for py3k Message-ID: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> I've submitted patches to emacs for python 3000 support. It does not handle any new syntax but the emacs<->python interaction works again. This applies to the python.el that ships with emacs22, not python-mode.el. The changes are available in emacs cvs. If you don't want to build a new copy it should be sufficient to pull the files python.el, emacs.py, emacs2.py and emacs3.py. -- Adam Hupp | http://hupp.org/adam/ From guido at python.org Tue Oct 2 17:04:34 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Oct 2007 08:04:34 -0700 Subject: [Python-3000] Emacs22 python.el support for py3k In-Reply-To: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> Message-ID: On 10/2/07, Adam Hupp wrote: > I've submitted patches to emacs for python 3000 support. It does not > handle any new syntax but the emacs<->python interaction works again. > This applies to the python.el that ships with emacs22, not > python-mode.el. Just curious -- how do python.el and python-mode.el differ? > The changes are available in emacs cvs. If you don't want to build a > new copy it should be sufficient to pull the files python.el, > emacs.py, emacs2.py and emacs3.py. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From adam at hupp.org Tue Oct 2 17:28:19 2007 From: adam at hupp.org (Adam Hupp) Date: Tue, 2 Oct 2007 11:28:19 -0400 Subject: [Python-3000] Emacs22 python.el support for py3k In-Reply-To: References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> Message-ID: <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> On 10/2/07, Guido van Rossum wrote: > > Just curious -- how do python.el and python-mode.el differ? Off the top of my head: * python-mode.el did not play well with transient-mark-mode (mark-block didn't work). transient-mark-mode highlights the marked region and is required for other functions (e.g. comment-dwim). * python-mode.el had problems with syntax highlighting in the presence of triple quoted strings and in comments. python.el does not. * python.el is supposed to be more consistent with other major modes. e.g. M-; for comment. * python.el ships with emacs. There are claims that python-mode.el was not as well maintained for FSF emacs as XEmacs. -- Adam Hupp | http://hupp.org/adam/ From barry at python.org Tue Oct 2 17:33:44 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 2 Oct 2007 11:33:44 -0400 Subject: [Python-3000] Emacs22 python.el support for py3k In-Reply-To: <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> Message-ID: <8B5D00B9-F765-43F6-B3DE-AA6BB50CA611@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 2, 2007, at 11:28 AM, Adam Hupp wrote: > On 10/2/07, Guido van Rossum wrote: >> >> Just curious -- how do python.el and python-mode.el differ? > > Off the top of my head: > > * python-mode.el did not play well with transient-mark-mode > (mark-block didn't work). transient-mark-mode highlights the marked > region and is required for other functions (e.g. comment-dwim). > > * python-mode.el had problems with syntax highlighting in the > presence of triple quoted strings and in comments. python.el does > not. > > * python.el is supposed to be more consistent with other major modes. > e.g. M-; for comment. > > * python.el ships with emacs. There are claims that python-mode.el > was not as well maintained for FSF emacs as XEmacs. It would be nice if there were only one mode that worked with both FSF Emacs and XEmacs and merged the best qualities of both modes. I don't have much time to work on that, and I suspect Skip is pretty busy too. Adam, if you're interested, willing, and able to help develop such a merge, python-mode at python.org would be the place to do so. I'd certainly be willing to test and I'd try to do a limited amount of XEmacs compatibility hacking. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRwJk2XEjvBPtnXfVAQJ9ZgP/bbG+OSHEnWGCBIXibnTzxEUL2ifIO8YU E/odKLMogXKFc40/weansKpjX9+Mv+/ye7a49HPH+AZ2vxKJsFvZVHill6F3pbh2 bd+94O1AkYIsuJwO7u3Pc3clje85jXDSUtmPRM3yWGweLDNNDaS4kxE02tNqdSTd rKiHn4gUzYk= =zMKd -----END PGP SIGNATURE----- From tjreedy at udel.edu Tue Oct 2 17:59:07 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 2 Oct 2007 11:59:07 -0400 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes andMutable Buffer References: <4700FC40.1060206@gmail.com> Message-ID: "Christian Heimes" wrote in message news:fdstov$av5$1 at sea.gmane.org... | Terry Reedy wrote: | > If orig_data were mutable (the new buffer, as proposed in the PEP), would | > not | > | > for i in range(len(orig_data)): | > orig_data[i] &= 0x1F | > | > do it in place? (I don't have .0a1 to try on the current bytes.) | | Good catch! | | Python 3.0a1 (py3k:58282, Sep 29 2007, 15:07:57) | [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 | >>> orig_data = b"abc" | >>> orig_data | b'abc' | >>> for i in range(len(orig_data)): | ... orig_data[i] &= 0x1F | ... | >>> orig_data | b'\x01\x02\x03' Thanks for testing this! Glad it worked. This sort of thing makes having bytes/buffer[i] an int a plus. (Just noticed, PEP accepted.) | It'd be useful and more efficient if the new buffer type would support | the bit wise operations directly: | | >>> orig_data &= 0x1F | TypeError: unsupported operand type(s) for &=: 'bytes' and 'int' This sort of broadcast behavior seems like numpy territory to me. Or better for a buffer subclass. Write it first in Python, using loops like above (partly for documentation and other implementations), then in C when interest and usage warrents. | >>> orig_data &= b"\x1F" | TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes' Ugh is my response. Stick with the first ;-). Terry Jan Reedy From guido at python.org Tue Oct 2 18:24:01 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Oct 2007 09:24:01 -0700 Subject: [Python-3000] Emacs22 python.el support for py3k In-Reply-To: <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> Message-ID: So is python.el a descendant of python-mode.el, or an independent development? On 10/2/07, Adam Hupp wrote: > On 10/2/07, Guido van Rossum wrote: > > > > Just curious -- how do python.el and python-mode.el differ? > > Off the top of my head: > > * python-mode.el did not play well with transient-mark-mode > (mark-block didn't work). transient-mark-mode highlights the marked > region and is required for other functions (e.g. comment-dwim). > > * python-mode.el had problems with syntax highlighting in the > presence of triple quoted strings and in comments. python.el does > not. > > * python.el is supposed to be more consistent with other major modes. > e.g. M-; for comment. > > * python.el ships with emacs. There are claims that python-mode.el > was not as well maintained for FSF emacs as XEmacs. > > -- > Adam Hupp | http://hupp.org/adam/ > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From adam at hupp.org Tue Oct 2 18:44:54 2007 From: adam at hupp.org (Adam Hupp) Date: Tue, 2 Oct 2007 12:44:54 -0400 Subject: [Python-3000] Emacs22 python.el support for py3k In-Reply-To: References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> Message-ID: <766a29bd0710020944x36e69500k9d8af8e4a619f537@mail.gmail.com> On 10/2/07, Guido van Rossum wrote: > So is python.el a descendant of python-mode.el, or an independent development? I've never seen a definitive statement but I believe it was developed independently. -- Adam Hupp | http://hupp.org/adam/ From skip at pobox.com Tue Oct 2 19:05:17 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 2 Oct 2007 12:05:17 -0500 Subject: [Python-3000] Emacs22 python.el support for py3k In-Reply-To: <766a29bd0710020944x36e69500k9d8af8e4a619f537@mail.gmail.com> References: <766a29bd0710020737q4d4d762coa91b4c6bd15b278f@mail.gmail.com> <766a29bd0710020828y150cfb5dgf33b440b02a71458@mail.gmail.com> <766a29bd0710020944x36e69500k9d8af8e4a619f537@mail.gmail.com> Message-ID: <18178.31309.146267.585340@montanaro.dyndns.org> Guido> So is python.el a descendant of python-mode.el, or an independent Guido> development? Adam> I've never seen a definitive statement but I believe it was Adam> developed independently. Correct. Skip From qrczak at knm.org.pl Tue Oct 2 20:49:07 2007 From: qrczak at knm.org.pl (Marcin 'Qrczak' Kowalczyk) Date: Tue, 02 Oct 2007 20:49:07 +0200 Subject: [Python-3000] Python, int/long and GMP In-Reply-To: <200709281858.29705.victor.stinner@haypocalc.com> References: <200709280429.39396.victor.stinner@haypocalc.com> <400ED549-B7C7-4A3D-9343-826B54E7B2BB@fuhm.net> <200709281858.29705.victor.stinner@haypocalc.com> Message-ID: <1191350947.8483.6.camel@qrnik> Dnia 28-09-2007, Pt o godzinie 18:58 +0200, Victor Stinner pisze: > I don't know GMP internals. I thaught that GMP uses an hack for small > integers. It does not. (And I'm glad that it does not, because it allows for super-specialized representation of small integers where even the space for mpz_t itself is not allocated. An GMP-internal optimization for the same cases would be underutilized and thus wasteful.) > I may also use Python garbage collector for GMP memory allocations > since GMP allows to use my own memory allocating functions. This would make linking with another library which uses GMP impossible (unless the allocator is compatible with malloc, reentrant etc.). Glasgow Haskell has been unfortunate to go that way. > GMP also has its own reference counter mechanism :-/ It does not. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From mark at qtrac.eu Wed Oct 3 04:24:50 2007 From: mark at qtrac.eu (Mark Summerfield) Date: Wed, 3 Oct 2007 03:24:50 +0100 Subject: [Python-3000] Are strs sequences of characters or disguised byte strings? Message-ID: <200710030324.50588.mark@qtrac.eu> In Python 3.0a1, exec() appears to normalize strings, but in other cases they don't appear to be normalized, and this leads to results that appear to be counter-intuitive in some cases, at least to me. >>> c1 = "\u00C7" >>> c2 = "C\u0327" >>> c3 = "\u0043\u0327" >>> c1, c2, c3 ('\xc7', 'C\u0327', 'C\u0327') >>> print(c1, c2) ? ? Clearly c1 and c2 are different at the byte level. But if we use them to create variables using exec(), Python appears to normalize them: >>> dir() ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3'] >>> exec("C\u0327 = 5") >>> dir() ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7'] >>> ? 5 >>> exec("\u00C7 = -7") >>> dir() ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7'] >>> ? -7 This seems to be the right behaviour to me, since from the point of view of a programmer, ? is the name of the variable, no matter what the underlying byte encoding used to represent the variable's name. >>> print(c1, c2) ? ? >>> c1.encode("utf8") == c2.encode("utf8") False This is what I'd expect, since here I'm comparing the actual bytes. But when I compare them as strings I really expect them to be compared as sequences of characters (in a human sense), so this: >>> c1 == c2 False seems counter-intuitive to me. It is easy to fix: >>> from unicodedata import normalize >>> normalize("NFKD", c1) == normalize("NFKD", c2) True but isn't it asking a lot of Python users to use normalize() whenever they want to perform such a basic operation as string comparison? Another issue that arises is that you can end up with duplicate dictionary keys and set elements. (The duplication is in human terms, in byte terms the keys/set elements differ of course): >>> d = {c1: 1, c2: 2} >>> d {'C\u0327': 2, '\xc7': 1} >>> for k, v in d.items(): ... print(k, v) ... ? 2 ? 1 I think this is surprising. >>> s = {c1, c2} >>> s {'C\u0327', '\xc7'} >>> for x in s: ... print(x) ... ? ? And the same result applies to sets of course. I don't know what the performance costs would be for always normalizing strings, but it seems to me that if strings are not normalized, then they are really being treated as byte strings thinly disguised as strings rather than as true sequences of characters whose byte representation is a detail that programmers can ignore (unless they choose to explicitly decode). -- Mark Summerfield, Qtrac Ltd., www.qtrac.eu From guido at python.org Wed Oct 3 05:28:56 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Oct 2007 20:28:56 -0700 Subject: [Python-3000] Are strs sequences of characters or disguised byte strings? In-Reply-To: <200710030324.50588.mark@qtrac.eu> References: <200710030324.50588.mark@qtrac.eu> Message-ID: String objects are arrays of code units. They can represent normalized and unnormalized Unicode text just as easily, and even invalid data, like half a surrogate and other illegal code units. It is up to the application (or perhaps at some point the library) to implement various checks and normalizations. AFAIK this is the same stance that Java and C# take -- the String types there don't concern themselves with the higher levels of Unicode standard compliance. (Though those languages probably have more library support than Python does -- perhaps someone can contribute something, like wrappers for ICU?) However, for identifiers occurring in source code, we *do* normalize before comparing them. PEP 3131 should explain this. --Guido On 10/2/07, Mark Summerfield wrote: > In Python 3.0a1, exec() appears to normalize strings, but in other cases > they don't appear to be normalized, and this leads to results that > appear to be counter-intuitive in some cases, at least to me. > > >>> c1 = "\u00C7" > >>> c2 = "C\u0327" > >>> c3 = "\u0043\u0327" > >>> c1, c2, c3 > ('\xc7', 'C\u0327', 'C\u0327') > >>> print(c1, c2) > ? ? > > Clearly c1 and c2 are different at the byte level. But if we use them to > create variables using exec(), Python appears to normalize them: > > >>> dir() > ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3'] > >>> exec("C\u0327 = 5") > >>> dir() > ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7'] > >>> ? > 5 > >>> exec("\u00C7 = -7") > >>> dir() > ['__builtins__', '__doc__', '__name__', 'c1', 'c2', 'c3', '\xc7'] > >>> ? > -7 > > This seems to be the right behaviour to me, since from the point of view > of a programmer, ? is the name of the variable, no matter what the > underlying byte encoding used to represent the variable's name. > > >>> print(c1, c2) > ? ? > >>> c1.encode("utf8") == c2.encode("utf8") > False > > This is what I'd expect, since here I'm comparing the actual bytes. > > But when I compare them as strings I really expect them to be compared > as sequences of characters (in a human sense), so this: > > >>> c1 == c2 > False > > seems counter-intuitive to me. It is easy to fix: > > >>> from unicodedata import normalize > >>> normalize("NFKD", c1) == normalize("NFKD", c2) > True > > but isn't it asking a lot of Python users to use normalize() whenever > they want to perform such a basic operation as string comparison? > > Another issue that arises is that you can end up with duplicate > dictionary keys and set elements. (The duplication is in human terms, in > byte terms the keys/set elements differ of course): > > >>> d = {c1: 1, c2: 2} > >>> d > {'C\u0327': 2, '\xc7': 1} > >>> for k, v in d.items(): > ... print(k, v) > ... > ? 2 > ? 1 > > I think this is surprising. > > >>> s = {c1, c2} > >>> s > {'C\u0327', '\xc7'} > >>> for x in s: > ... print(x) > ... > ? > ? > > And the same result applies to sets of course. > > I don't know what the performance costs would be for always normalizing > strings, but it seems to me that if strings are not normalized, then > they are really being treated as byte strings thinly disguised as > strings rather than as true sequences of characters whose byte > representation is a detail that programmers can ignore (unless they > choose to explicitly decode). > > -- > Mark Summerfield, Qtrac Ltd., www.qtrac.eu > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From lists at cheimes.de Wed Oct 3 19:30:46 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 03 Oct 2007 19:30:46 +0200 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes andMutable Buffer In-Reply-To: References: <4700FC40.1060206@gmail.com> Message-ID: Terry Reedy wrote: > | It'd be useful and more efficient if the new buffer type would support > | the bit wise operations directly: > | > | >>> orig_data &= 0x1F > | TypeError: unsupported operand type(s) for &=: 'bytes' and 'int' > > This sort of broadcast behavior seems like numpy territory to me. Or > better for a buffer subclass. Write it first in Python, using loops like > above (partly for documentation and other implementations), then in C when > interest and usage warrents. The C implementation of the bit wise operations for buffer() gains a large speed improvement over the Python implementation. I'm not sure if Guido would like it and I don't have a use case yet but it sounds like a useful addition to the new buffer() type: buffer &= smallint buffer |= smallint buffer ^= smallint newbuffer = buffer & smallint newbuffer = buffer | smallint newbuffer = buffer ^ smallint I'm willing to give it a try and implement it if people are interested in it. I have an use case for another feature but that's surely out of the scope for the Python core. For some algorithms especially cryptographic algorithms I could use a bytes type which contains larger elements than a char (unsigned int8) and which does overflow (255 + 1 == 0). for b in bytes(b"....", wordsize=32, signed=True): ... Again, it's just a pipe dream and I tend to say that it doesn't belong into the core. > > | >>> orig_data &= b"\x1F" > | TypeError: unsupported operand type(s) for &=: 'bytes' and 'bytes' > > Ugh is my response. Stick with the first ;-). Ugh, too :) Christian From guido at python.org Wed Oct 3 19:36:32 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Oct 2007 10:36:32 -0700 Subject: [Python-3000] Last call for PEP 3137: Immutable Bytes andMutable Buffer In-Reply-To: References: <4700FC40.1060206@gmail.com> Message-ID: On 10/3/07, Christian Heimes wrote: > I don't have a use case yet but it sounds like a > useful addition to the new buffer() type: That's a contradiction. Without a use case it's not useful. Let's be conservative on these "kitchen sink" ideas. They belong in python-ideas anyway. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at arctrix.com Wed Oct 3 20:01:06 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Wed, 3 Oct 2007 18:01:06 +0000 (UTC) Subject: [Python-3000] Simplifying pickle for Py3k Message-ID: I guess the library overhaul hasn't really started it but it would be nice if the pickle module could get some work. Today I'm trying to efficiently store a class using pickle and the documentation is making my head hurt. I don't think the documentation itself is the problem, just the fact that the rules are so complicated. I guess there are several different solutions: * Remove backwards compatible stuff from the code and the documentation. The downside is that old pickles could not be loaded. Perhaps that's not a huge issue since the removal of old-style classes might already break old pickles. * Remove the backwards compatible stuff from the documentation only. The would help people using the language but would still be a long term maintenance issue. * Leave the old code in but generate warnings when old pickle mechanisms are used. Eventually the old stuff could be removed from the code. * Provide an "oldpickle" module the supports pre-3k pickles. I think I like the warnings idea best. Neil From guido at python.org Wed Oct 3 20:29:18 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Oct 2007 11:29:18 -0700 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: I think it's essential to be able to *read* pickles generated by older Python versions. But for writing I'm okay with only writing protocol 2 (which Python 2.x also understands) and only supporting the modern APIs for customizing pickle writing. I don't think classic class instances are necessarily unpicklable in 3.0 -- they will just show up as instances of the corresponding new-style classes. --Guido On 10/3/07, Neil Schemenauer wrote: > I guess the library overhaul hasn't really started it but it would > be nice if the pickle module could get some work. Today I'm trying > to efficiently store a class using pickle and the documentation is > making my head hurt. I don't think the documentation itself is the > problem, just the fact that the rules are so complicated. > > I guess there are several different solutions: > > * Remove backwards compatible stuff from the code and the > documentation. The downside is that old pickles could not be > loaded. Perhaps that's not a huge issue since the removal of > old-style classes might already break old pickles. > > * Remove the backwards compatible stuff from the documentation > only. The would help people using the language but would > still be a long term maintenance issue. > > * Leave the old code in but generate warnings when old pickle > mechanisms are used. Eventually the old stuff could be > removed from the code. > > * Provide an "oldpickle" module the supports pre-3k pickles. > > I think I like the warnings idea best. > > Neil > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Wed Oct 3 20:28:48 2007 From: barry at python.org (Barry Warsaw) Date: Wed, 3 Oct 2007 14:28:48 -0400 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: <09EFA1D6-BF99-47A5-8C04-9C481E6DA75D@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Oct 3, 2007, at 2:01 PM, Neil Schemenauer wrote: > I guess the library overhaul hasn't really started it but it would > be nice if the pickle module could get some work. Today I'm trying > to efficiently store a class using pickle and the documentation is > making my head hurt. I don't think the documentation itself is the > problem, just the fact that the rules are so complicated. +1. Try reverse engineering those rules if you really want to have some fun. ;) > I guess there are several different solutions: > > * Remove backwards compatible stuff from the code and the > documentation. The downside is that old pickles could not be > loaded. Perhaps that's not a huge issue since the removal of > old-style classes might already break old pickles. > > * Remove the backwards compatible stuff from the documentation > only. The would help people using the language but would > still be a long term maintenance issue. > > * Leave the old code in but generate warnings when old pickle > mechanisms are used. Eventually the old stuff could be > removed from the code. > > * Provide an "oldpickle" module the supports pre-3k pickles. > > I think I like the warnings idea best. I'm not sure about eventually removing the code, since we may need long term support for migration from 2.x pickles to 3.0 pickles. OTOH, if 2to3 or Python 2.6+ could include pickle migration code, that might be fine. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRwPfYXEjvBPtnXfVAQJfSwQAnoAmgSQy99rJz4C+hks0jvKZz5X3yNOa qV9pV9942KEVZN5lwXLtzoWAnBr9MpXTjZ9AEmDgJVScSXV4Vk/MegsS/Q8R2diG 88x1vpuXQF333CHgWnGiQYw6lysZfP5rbKEHaOYwQB4mjLTS7VSKuZdVtZvvMGH8 7HDj3GqqC0I= =1Plz -----END PGP SIGNATURE----- From g.brandl at gmx.net Wed Oct 3 20:36:19 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 03 Oct 2007 20:36:19 +0200 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: Neil Schemenauer schrieb: > I guess the library overhaul hasn't really started it but it would > be nice if the pickle module could get some work. Today I'm trying > to efficiently store a class using pickle and the documentation is > making my head hurt. I don't think the documentation itself is the > problem, just the fact that the rules are so complicated. > > I guess there are several different solutions: > > * Remove backwards compatible stuff from the code and the > documentation. The downside is that old pickles could not be > loaded. Perhaps that's not a huge issue since the removal of > old-style classes might already break old pickles. > > * Remove the backwards compatible stuff from the documentation > only. The would help people using the language but would > still be a long term maintenance issue. > > * Leave the old code in but generate warnings when old pickle > mechanisms are used. Eventually the old stuff could be > removed from the code. > > * Provide an "oldpickle" module the supports pre-3k pickles. > > I think I like the warnings idea best. I'm in favor of #1, perhaps combined with #4. With the fundamental change in basic types (unicode -> str, str -> bytes) I wouldn't expect 2.x pickles to be loadable by 3.0 anyway. Cruft removal from the pickle protocol is really needed; I don't envy everyone reading the pickle docs trying to understand which method exactly he has to implement, which is going to be called with what arguments, etc. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From skip at pobox.com Wed Oct 3 22:27:34 2007 From: skip at pobox.com (skip at pobox.com) Date: Wed, 3 Oct 2007 15:27:34 -0500 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: <18179.64310.424753.609880@montanaro.dyndns.org> Georg> I don't envy everyone reading the pickle docs trying to Georg> understand which method exactly he has to implement, which is Georg> going to be called with what arguments, etc. Agreed. I've been going through that (painful) exercise the past couple of days as I try and figure out what methods my to-be-pickled objects need to implement. __reduce__, __reduce_ex__, __getstate__, __setstate__, copy_reg, __safe_for_unpickling__, __getnewargs__. Your head starts to swim after awhile. Skip From lists at cheimes.de Wed Oct 3 23:52:10 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 03 Oct 2007 23:52:10 +0200 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: Neil Schemenauer wrote: > I guess there are several different solutions: > > * Remove backwards compatible stuff from the code and the > documentation. The downside is that old pickles could not be > loaded. Perhaps that's not a huge issue since the removal of > old-style classes might already break old pickles. > > * Remove the backwards compatible stuff from the documentation > only. The would help people using the language but would > still be a long term maintenance issue. > > * Leave the old code in but generate warnings when old pickle > mechanisms are used. Eventually the old stuff could be > removed from the code. > > * Provide an "oldpickle" module the supports pre-3k pickles. > > I think I like the warnings idea best. Please keep in mind that we want people to move to Python 3.x. Pickles are very important for a bunch of well known and large Python applications like Zope2, Zope3, Mailman and probably many more. Zope's ZODB makes heavy use of pickles. If you remove the support for old style pickles from Python 2.x you also remove the migration path for a large user base to Python 3.x. I like to propose option (4b): Provide an oldpickle module which can load old pickles and migrate an old pickle to a Python 3.x pickle. As long as Python 3.0 can load and migrate old to new pickles I'm also for option (1). The pickle module could use an emaciation. Christian From greg.ewing at canterbury.ac.nz Thu Oct 4 00:24:14 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 04 Oct 2007 10:24:14 +1200 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: <18179.64310.424753.609880@montanaro.dyndns.org> References: <18179.64310.424753.609880@montanaro.dyndns.org> Message-ID: <4704168E.3090005@canterbury.ac.nz> skip at pobox.com wrote: > I've been going through that (painful) exercise the past couple of > days as I try and figure out what methods my to-be-pickled objects need to > implement. __reduce__, __reduce_ex__, __getstate__, __setstate__, copy_reg, > __safe_for_unpickling__, __getnewargs__. Your head starts to swim after > awhile. Not all of these are old cruft -- some of them are alternatives that are useful in one situation or another. Some of them could no doubt be removed, though. -- Greg From alexandre at peadrop.com Thu Oct 4 08:49:16 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Thu, 4 Oct 2007 02:49:16 -0400 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: On 10/3/07, Neil Schemenauer wrote: > I guess the library overhaul hasn't really started it but it would > be nice if the pickle module could get some work. Today I'm trying > to efficiently store a class using pickle Could you elaborate on what you are trying to do? > and the documentation is making my head hurt. I don't think the > documentation itself is the problem, just the fact that the rules > are so complicated. > > I guess there are several different solutions: > > * Remove backwards compatible stuff from the code and the > documentation. The downside is that old pickles could not be > loaded. Perhaps that's not a huge issue since the removal of > old-style classes might already break old pickles. > This would not simplify the pickle module by much. So, I don't think this would justify breaking backward-compatibility. As far as I know, the removal of the old-style classes does not break old pickle streams, since the code of classes is not pickled but referenced. > * Remove the backwards compatible stuff from the documentation > only. The would help people using the language but would > still be a long term maintenance issue. The documentation for the pickle module is completely outdated and confusing. In fact, some sections are outright wrong about how the current module works. If I get some free time (which is unlikely, right now), I will update the documentation. > * Leave the old code in but generate warnings when old pickle > mechanisms are used. Eventually the old stuff could be > removed from the code. Could point out specific examples of the "old code" that you are referring to? > * Provide an "oldpickle" module the supports pre-3k pickles. As I said, old pickle streams should work fine with Py3k. So, adding yet another pickle module is unnecessary. -- Alexandre From nas at arctrix.com Fri Oct 5 06:59:30 2007 From: nas at arctrix.com (Neil Schemenauer) Date: Thu, 4 Oct 2007 22:59:30 -0600 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: References: Message-ID: <20071005045930.GA20564@arctrix.com> On Thu, Oct 04, 2007 at 02:49:16AM -0400, Alexandre Vassalotti wrote: > Could you elaborate on what you are trying to do? I'm trying to efficiently pickle a 'unicode' subclass. I'm disappointed that it's not possible to be as efficient as the built-in unicode class, even when using an extension code. > The documentation for the pickle module is completely outdated and > confusing. In fact, some sections are outright wrong about how the > current module works. If I get some free time (which is unlikely, > right now), I will update the documentation. Yes, I've changed my mind and agree. PEP 307 provides a lot of details that library docs do not but it's not written as a reference doc. Improved library docs would help a lot. > > * Leave the old code in but generate warnings when old pickle > > mechanisms are used. Eventually the old stuff could be > > removed from the code. > > Could point out specific examples of the "old code" that you are referring to? I don't have time right now to point at specific code. How about the code that implements all the different versions of __reduce__ and code for __getinitargs__, __getstate__, __setstate__? In any case, it looks like there will be volunteers to maintain the backwards compatability of the pickle module. That's great. Neil From mark at qtrac.eu Fri Oct 5 09:20:39 2007 From: mark at qtrac.eu (Mark Summerfield) Date: Fri, 5 Oct 2007 08:20:39 +0100 Subject: [Python-3000] Small renaming suggestion: re.sub() -> re.replace() or re.substitute() Message-ID: <200710050820.39238.mark@qtrac.eu> Hi, It seems to me that one of the few really "bad" method names in the Python library that I regularly encounter is re.sub(). I don't like the name because: (1) It is an abbreviation, but not an "obvious" one like max and min (2) It is an ambiguous name: could be substitute or could be subtract (3) Elsewhere where special method __foo__ that implements a named (as opposed to symbol-based) method, that method is called foo. For example, __cmp__() -> cmp(), __int__() -> int(), __len__() -> len(). But __add__ -> +, __sub__() -> -. (4) It is the only function with this name in the library; whereas there are several replace methods: bytes.replace() str.replace() datetime.date.replace() # and a few others, plus some replace_* functions. Although re.substitute() would work (and be better than sub), I think re.replace() is better and more consistent regarding the rest of the library. And as for subn, well, replacen or substituten are possible, but why not have just one method and have an optional keyword argument if a tuple is wanted? -- Mark Summerfield, Qtrac Ltd., www.qtrac.eu From facundobatista at gmail.com Fri Oct 5 12:45:54 2007 From: facundobatista at gmail.com (Facundo Batista) Date: Fri, 5 Oct 2007 07:45:54 -0300 Subject: [Python-3000] Small renaming suggestion: re.sub() -> re.replace() or re.substitute() In-Reply-To: <200710050820.39238.mark@qtrac.eu> References: <200710050820.39238.mark@qtrac.eu> Message-ID: 2007/10/5, Mark Summerfield : > Although re.substitute() would work (and be better than sub), I think > re.replace() is better and more consistent regarding the rest of the > library. +1, happened twice to me, different jobs, that a colleague came to me asking why there was no "replace" in "re". Yes, sub() is even difficult to find (unless you *read* all the descriptions of the methods). Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From alexandre at peadrop.com Sat Oct 6 06:35:39 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sat, 6 Oct 2007 00:35:39 -0400 Subject: [Python-3000] Simplifying pickle for Py3k In-Reply-To: <20071005045930.GA20564@arctrix.com> References: <20071005045930.GA20564@arctrix.com> Message-ID: On 10/5/07, Neil Schemenauer wrote: > On Thu, Oct 04, 2007 at 02:49:16AM -0400, Alexandre Vassalotti wrote: > > Could you elaborate on what you are trying to do? > > I'm trying to efficiently pickle a 'unicode' subclass. I'm > disappointed that it's not possible to be as efficient as the > built-in unicode class, even when using an extension code. There is a few things you could do to produce smaller pickle streams. If you are certain that the objects you will pickle are not self-referential, then you can set Pickler.fast to True. This will disable the "memorizer", which adds a 2-bytes overhead to each objects pickled (depending on the input, this might or not shorten the resulting stream). If this isn't enough, then you could subclass Pickler and Unpickler and define a custom rule for your unicode subclass. An obvious optimization for pickle, in Py3k, would to add support for short unicode string. Currently, there is a 4-bytes overhead per string. Since Py3k is unicode throughout, this overhead can become quite large. > > Could point out specific examples of the "old code" that you are referring to? > > I don't have time right now to point at specific code. How about > the code that implements all the different versions of __reduce__ > and code for __getinitargs__, __getstate__, __setstate__? At first glance, __reduce__ seems to be useful only for instances of subclasses of built-in type. However, __getnewsargs__ could easily replace it for that. So, removing __reduce__ (and __reduce_ex__) is probably a good idea. As far as I know, the current pickle module doesn't use __getinitargs__ (this is one of the things the documentation is totally wrong about). As for __getstate__ and __setstate__, I think they are essential. Without them, you couldn't pickle objects with __slots__ or save the I/O state of certain objects. It would certainly be possible to simplify a little the algorithm used for pickling class instances. In "pseudo-code", it would look like something along these lines: def save_obj(obj): # let obj be the instance of a user-defined class cls = obj.__class__ if hasattr(obj, "__getnewargs__"): args = obj.__getnewargs__() else: args = () if hasattr(obj, "__getstate__"): state = obj.__getstate__() else: state = obj.__dict__ return (cls, args, state) def load_obj(cls, args, state): obj = cls.__new__(cls, *args) if hasattr(obj, "__getstate__"): try: obj.__setstate__(state) except AttributeError: raise UnpicklingError else: obj.__dict__.update(state) return obj The main difference, between this and current method used to pickle instances, is the use of __getnewargs__, instead of __reduce__. -- Alexandre From guido at python.org Mon Oct 8 06:32:59 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Oct 2007 21:32:59 -0700 Subject: [Python-3000] PEP 3137 plan of attack Message-ID: I'd like to make complete implementation of PEP 3137 the goal for the 3.0a2 release. It should be doable to do this release by the end of October. I don't think anything else *needs* to be done to have a successful a2 release. The work for PEP 3137 can be split into a number of relatively independent steps. In some cases these can even be carried out in either order. I'd love to see volunteers for each of these steps. Note: I'll refer to the three string types by their C names, as I plan to keep those unchanged in 3.0a2. We can rename them later, but renaming them will make merging from the trunk and converting 3rd party extensions harder. The C names are PyString (immutable bytes), PyBytes (mutable bytes), PyUnicode (immutable unicode code units, either 16 bits or 32 bits). The tasks I can think of are: - remove locale support from PyString - remove compatibility with PyUnicode from PyString - remove compatibility with PyString from PyUnicode - add missing methods to PyBytes (for list, see the PEP and compare to what's already there) - remove buffer API from PyUnicode - make == and != between PyBytes and PyUnicode return False instead of raising TypeError - make == and != between PyString and Pyunicode return False instead of converting - make comparisons between PyString and PyBytes work (these are properly ordered) - change lots of places (e.g. encoders) to return PyString instead of PyBytes - change indexing and iteration over PyString to return ints, not 1-char PyStrings - change PyString's repr() to return "b'...'" - change PyBytes's repr() to return "buffer(b'...')" - change parser so that b"..." returns PyString, not PyBytes - rename bytes -> buffer, str8 -> bytes If a task is done independently from the others, it should include changes to keep the unit tests working. If you volunteer, please send out an email to this list before you start doing any work, to avoid duplicate work (unless sending the email would take more time than it would take to write the code, compile it, run all unit tests, and upload the patch). I'd appreciate it if you gave an estimate for when you expect to be done (or give up) too. For code submissions, please use bugs.python.org and send an email pointing to the relevant issue to this list. PS. Is there anyone who understands test_urllib2net and can fix it? It's been failing for weeks (maybe months) now. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tom at vector-seven.com Mon Oct 8 07:03:37 2007 From: tom at vector-seven.com (Thomas Lee) Date: Mon, 08 Oct 2007 15:03:37 +1000 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <4709BA29.3060503@vector-seven.com> Guido van Rossum wrote: > - make == and != between PyBytes and PyUnicode return False instead of > raising TypeError > - make == and != between PyString and Pyunicode return False instead > of converting > - make comparisons between PyString and PyBytes work (these are > properly ordered) > If nobody else is doing this, it sounds like sounds like something I - as a relative newbie - could handle. Possibly the repr() stuff too if nobody else wants that. Should be able to get a patch up before Friday. Cheers, Tom From lists at cheimes.de Mon Oct 8 13:17:55 2007 From: lists at cheimes.de (Christian Heimes) Date: Mon, 08 Oct 2007 13:17:55 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: Guido van Rossum wrote: > - change PyString's repr() to return "b'...'" > - change PyBytes's repr() to return "buffer(b'...')" > - change parser so that b"..." returns PyString, not PyBytes I'll take the three steps. They sound like low hanging fruits even for a noob like me. I expect to have a working patch in the new couple of days. Christian From greg at krypto.org Mon Oct 8 18:32:31 2007 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 8 Oct 2007 09:32:31 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> > - add missing methods to PyBytes (for list, see the PEP and compare to > what's already there) > - remove buffer API from PyUnicode I'll take these two with a goal of having them done by the end of the week. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071008/3e1e9a51/attachment.htm From janssen at parc.com Mon Oct 8 19:51:23 2007 From: janssen at parc.com (Bill Janssen) Date: Mon, 8 Oct 2007 10:51:23 PDT Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <07Oct8.105132pdt."57996"@synergy1.parc.xerox.com> I think I can spend some time on the 3K SSL support, but I've been waiting till the "bytes" work settles down. Sounds like I should keep waiting a bit more? Or have the C APIs already settled? Bill From guido at python.org Mon Oct 8 20:42:09 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2007 11:42:09 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <7125419022533265919@unknownmsgid> References: <7125419022533265919@unknownmsgid> Message-ID: On 10/8/07, Bill Janssen wrote: > I think I can spend some time on the 3K SSL support, but I've been > waiting till the "bytes" work settles down. Sounds like I should > keep waiting a bit more? Or have the C APIs already settled? The C APIs haven't quite settled down yet, but I'd like to convince you that you needn't wait. For all bytes input, you should use the (new) buffer API,i. e. PyObject_GetBuffer() and PyObject_ReleaseBuffer() (grep for usage examples if they aren't sufficiently documented in the docs or in PEP 3118). For stuff that returns bytes, you can either use PyBytes_FromStringAndSize() -- which is the 3.0a1 recommended best practice (returning a mutable bytes object) or PyString_FromStringAndSize() -- which will be the 3.0a2 way of returning an immutable bytes object). Since they have the same signature there's very little to worry about having to change this around later. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Mon Oct 8 21:50:02 2007 From: brett at python.org (Brett Cannon) Date: Mon, 8 Oct 2007 12:50:02 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/7/07, Guido van Rossum wrote: [SNIP] > PS. Is there anyone who understands test_urllib2net and can fix it? > It's been failing for weeks (maybe months) now. I don't understand it but I fixed it in r58378. =) When ftplib.FTP was converted over to Py3K it was given a default encoding of ASCII on all read data, but that doesn't work as the stuff on the other end could be latin1 (and it was). So I just changed the default encoding. -Brett From guido at python.org Mon Oct 8 21:51:59 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2007 12:51:59 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Brett Cannon wrote: > On 10/7/07, Guido van Rossum wrote: > [SNIP] > > PS. Is there anyone who understands test_urllib2net and can fix it? > > It's been failing for weeks (maybe months) now. > > I don't understand it but I fixed it in r58378. =) > > When ftplib.FTP was converted over to Py3K it was given a default > encoding of ASCII on all read data, but that doesn't work as the stuff > on the other end could be latin1 (and it was). So I just changed the > default encoding. Cool. Though how do you know it was really latin1? Is there anything standardized about the encoding used by FTP? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Oct 8 22:03:35 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2007 13:03:35 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/7/07, Guido van Rossum wrote: > - remove locale support from PyString > - remove compatibility with PyUnicode from PyString > - remove compatibility with PyString from PyUnicode I'll tackle these myself by Friday, unless someone else beats me to it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Mon Oct 8 22:05:31 2007 From: brett at python.org (Brett Cannon) Date: Mon, 8 Oct 2007 13:05:31 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Guido van Rossum wrote: > On 10/8/07, Brett Cannon wrote: > > On 10/7/07, Guido van Rossum wrote: > > [SNIP] > > > PS. Is there anyone who understands test_urllib2net and can fix it? > > > It's been failing for weeks (maybe months) now. > > > > I don't understand it but I fixed it in r58378. =) > > > > When ftplib.FTP was converted over to Py3K it was given a default > > encoding of ASCII on all read data, but that doesn't work as the stuff > > on the other end could be latin1 (and it was). So I just changed the > > default encoding. > > Cool. Though how do you know it was really latin1? Is there anything > standardized about the encoding used by FTP? See, now I had to go and look stuff up. So much work for a holiday. =) According to the spec, data transfers can be anything based on data transfer format specified. ASCII is one of them, but so is Local which can be anything. Turns out that ftplib.FTP.connect() reads from the socket using socket.makefile('r', encoding), so it starts off in text mode. So that makes restricting the encoding to bytes < 128 a bad thing as not all possible data transfers would be legal. Basically it sounds like the ftplib module might need a thorough rewrite to use bytes/buffers so that the proper decoding happens at the last second. But I am not the person to do that rewrite. =) -Brett From guido at python.org Mon Oct 8 22:08:01 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2007 13:08:01 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Brett Cannon wrote: > On 10/8/07, Guido van Rossum wrote: > > On 10/8/07, Brett Cannon wrote: > > > On 10/7/07, Guido van Rossum wrote: > > > [SNIP] > > > > PS. Is there anyone who understands test_urllib2net and can fix it? > > > > It's been failing for weeks (maybe months) now. > > > > > > I don't understand it but I fixed it in r58378. =) > > > > > > When ftplib.FTP was converted over to Py3K it was given a default > > > encoding of ASCII on all read data, but that doesn't work as the stuff > > > on the other end could be latin1 (and it was). So I just changed the > > > default encoding. > > > > Cool. Though how do you know it was really latin1? Is there anything > > standardized about the encoding used by FTP? > > See, now I had to go and look stuff up. So much work for a holiday. =) > > According to the spec, data transfers can be anything based on data > transfer format specified. ASCII is one of them, but so is Local > which can be anything. > > Turns out that ftplib.FTP.connect() reads from the socket using > socket.makefile('r', encoding), so it starts off in text mode. So > that makes restricting the encoding to bytes < 128 a bad thing as not > all possible data transfers would be legal. > > Basically it sounds like the ftplib module might need a thorough > rewrite to use bytes/buffers so that the proper decoding happens at > the last second. But I am not the person to do that rewrite. =) Thanks. Mind filing a bug for someone to find? It sounds like the rewrite might be easier once we have immutable bytes. (So this conversation is not entirely off-topic for this thread. ;-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Mon Oct 8 22:12:22 2007 From: brett at python.org (Brett Cannon) Date: Mon, 8 Oct 2007 13:12:22 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Guido van Rossum wrote: > On 10/8/07, Brett Cannon wrote: > > On 10/8/07, Guido van Rossum wrote: > > > On 10/8/07, Brett Cannon wrote: > > > > On 10/7/07, Guido van Rossum wrote: > > > > [SNIP] > > > > > PS. Is there anyone who understands test_urllib2net and can fix it? > > > > > It's been failing for weeks (maybe months) now. > > > > > > > > I don't understand it but I fixed it in r58378. =) > > > > > > > > When ftplib.FTP was converted over to Py3K it was given a default > > > > encoding of ASCII on all read data, but that doesn't work as the stuff > > > > on the other end could be latin1 (and it was). So I just changed the > > > > default encoding. > > > > > > Cool. Though how do you know it was really latin1? Is there anything > > > standardized about the encoding used by FTP? > > > > See, now I had to go and look stuff up. So much work for a holiday. =) > > > > According to the spec, data transfers can be anything based on data > > transfer format specified. ASCII is one of them, but so is Local > > which can be anything. > > > > Turns out that ftplib.FTP.connect() reads from the socket using > > socket.makefile('r', encoding), so it starts off in text mode. So > > that makes restricting the encoding to bytes < 128 a bad thing as not > > all possible data transfers would be legal. > > > > Basically it sounds like the ftplib module might need a thorough > > rewrite to use bytes/buffers so that the proper decoding happens at > > the last second. But I am not the person to do that rewrite. =) > > Thanks. Mind filing a bug for someone to find? It sounds like the > rewrite might be easier once we have immutable bytes. (So this > conversation is not entirely off-topic for this thread. ;-) Created issue1248. -Brett From nnorwitz at gmail.com Mon Oct 8 22:13:29 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Mon, 8 Oct 2007 13:13:29 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Guido van Rossum wrote: > On 10/7/07, Guido van Rossum wrote: > > - remove locale support from PyString > > - remove compatibility with PyUnicode from PyString > > - remove compatibility with PyString from PyUnicode > > I'll tackle these myself by Friday, unless someone else beats me to it. I experimented a bit with removing some of the delegation to PyUnicode in stringobject.c. I ran into many problems starting the interpreter or printing things out (fatal errors or exceptions). It seems we still are using str8 in a bunch of places that need to converted to Unicode. I think that will make it easier to rip out the dependencies. If I have time, I'll probably focus on converting more uses of PyString to PyUnicode. These need to be done anyways and will probably make other changes easier. n From phd at phd.pp.ru Mon Oct 8 22:00:15 2007 From: phd at phd.pp.ru (Oleg Broytmann) Date: Tue, 9 Oct 2007 00:00:15 +0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <20071008200015.GA3316@phd.pp.ru> On Mon, Oct 08, 2007 at 12:51:59PM -0700, Guido van Rossum wrote: > Cool. Though how do you know it was really latin1? Is there anything > standardized about the encoding used by FTP? There is no. Russian users, e.g., use all encodings - koi8-r, cp1251, utf-8; cp1251 is the most popular here, of course. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From alexandre at peadrop.com Tue Oct 9 00:05:36 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 8 Oct 2007 18:05:36 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Guido van Rossum wrote: > - remove buffer API from PyUnicode > - change PyString's repr() to return "b'...'" > - change PyBytes's repr() to return "buffer(b'...')" I got patches for these. I plan to submit them for review after doing more testing to make sure they work right. -- Alexandre From guido at python.org Tue Oct 9 00:36:05 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2007 15:36:05 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: Cool. Just notice that you haven't been following protocol -- Christian Heimes volunteered to do these too. :-) On 10/8/07, Alexandre Vassalotti wrote: > On 10/8/07, Guido van Rossum wrote: > > - remove buffer API from PyUnicode > > - change PyString's repr() to return "b'...'" > > - change PyBytes's repr() to return "buffer(b'...')" > > I got patches for these. I plan to submit them for review after doing > more testing to make sure they work right. > > > -- Alexandre > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Tue Oct 9 00:45:17 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 8 Oct 2007 18:45:17 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Guido van Rossum wrote: > Cool. Just notice that you haven't been following protocol -- > Christian Heimes volunteered to do these too. :-) Oops, sorry Christian for taking yours. -- Alexandre From brett at python.org Tue Oct 9 01:19:34 2007 From: brett at python.org (Brett Cannon) Date: Mon, 8 Oct 2007 16:19:34 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Alexandre Vassalotti wrote: > On 10/8/07, Guido van Rossum wrote: > > Cool. Just notice that you haven't been following protocol -- > > Christian Heimes volunteered to do these too. :-) > > Oops, sorry Christian for taking yours. See http://bugs.python.org/issue1247 for Christian's patch. Maybe you can do a code review of Christian's work, Alexandre? And if you want to be really brave you could maybe even do the commit yourself. =) -Brett From alexandre at peadrop.com Tue Oct 9 00:56:41 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 8 Oct 2007 18:56:41 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Guido van Rossum wrote: > - change indexing and iteration over PyString to return ints, not > 1-char PyStrings I will try do this one. -- Alexandre From lists at cheimes.de Tue Oct 9 00:57:49 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 09 Oct 2007 00:57:49 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: Alexandre Vassalotti wrote: > On 10/8/07, Guido van Rossum wrote: >> Cool. Just notice that you haven't been following protocol -- >> Christian Heimes volunteered to do these too. :-) > > Oops, sorry Christian for taking yours. I've submitted my patch a few hours ago. I wasn't able to test it to full extend because the svn server was down and I couldn't get the latest update. I noticed that PyBytes doesn't have an iteration view like PyString. Do we need a view for it? Christian From lists at cheimes.de Tue Oct 9 01:29:31 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 09 Oct 2007 01:29:31 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <470ABD5B.3060601@cheimes.de> Brett Cannon wrote: > See http://bugs.python.org/issue1247 for Christian's patch. Maybe you > can do a code review of Christian's work, Alexandre? And if you want > to be really brave you could maybe even do the commit yourself. =) I'm not happy with: static const char *quote_prefix = "buffer(b'"; p = PyUnicode_AS_UNICODE(v); for (i=0; i References: Message-ID: On 10/8/07, Christian Heimes wrote: > Alexandre Vassalotti wrote: > > On 10/8/07, Guido van Rossum wrote: > >> Cool. Just notice that you haven't been following protocol -- > >> Christian Heimes volunteered to do these too. :-) > > > > Oops, sorry Christian for taking yours. > > I've submitted my patch a few hours ago. I wasn't able to test it to > full extend because the svn server was down and I couldn't get the > latest update. Now we'll have competing patches. Can you two please review each other's so I won't have to review two? Anyway, anonymous svn should be working again. > I noticed that PyBytes doesn't have an iteration view like PyString. Do > we need a view for it? Yes, that would be a good idea! This currently causes a bit of a problem for the Sequence ABC. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Tue Oct 9 01:55:13 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 8 Oct 2007 19:55:13 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470ABD5B.3060601@cheimes.de> References: <470ABD5B.3060601@cheimes.de> Message-ID: Ah! In my review, I was going to suggest you that: while (*quote_prefix) *p++ = *quote_prefix++; -- Alexandre On 10/8/07, Christian Heimes wrote: > I'm not happy with: > > static const char *quote_prefix = "buffer(b'"; > p = PyUnicode_AS_UNICODE(v); > for (i=0; i *p++ = quote_prefix[i]; > } > > but I didn't know how to code it more elegant. It follows the previous > version of the code and it's the fastest way I can think of without From qrczak at knm.org.pl Tue Oct 9 02:02:26 2007 From: qrczak at knm.org.pl (Marcin =?UTF-8?Q?=E2=80=98Qrczak=E2=80=99?= Kowalczyk) Date: Tue, 09 Oct 2007 02:02:26 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470ABD5B.3060601@cheimes.de> References: <470ABD5B.3060601@cheimes.de> Message-ID: <1191888146.15402.5.camel@qrnik> Dnia 09-10-2007, Wt o godzinie 01:29 +0200, Christian Heimes pisze: > I'm not happy with: > > static const char *quote_prefix = "buffer(b'"; > p = PyUnicode_AS_UNICODE(v); > for (i=0; i *p++ = quote_prefix[i]; > } strlen in a loop is bad for performance. I would do: static const Py_UNICODE quote_prefix[] = { 'b', 'u', 'f', 'f', 'e', 'r', '(', 'b', '\'' }; and memcpy. -- __("< Marcin Kowalczyk \__/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/ From tom at vector-seven.com Tue Oct 9 15:31:16 2007 From: tom at vector-seven.com (Thomas Lee) Date: Tue, 09 Oct 2007 23:31:16 +1000 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <4709BA29.3060503@vector-seven.com> References: <4709BA29.3060503@vector-seven.com> Message-ID: <470B82A4.8030703@vector-seven.com> Thomas Lee wrote: > Guido van Rossum wrote: > >> - make == and != between PyBytes and PyUnicode return False instead of >> raising TypeError >> A patch for this is ready. I'll submit it to the bug tracker later tonight. >> - make == and != between PyString and Pyunicode return False instead >> of converting >> This will be trivial, but I need to ask a stupid question: is this also true for PyUnicode_Compare? (i.e. should PyUnicode_Compare(str8(), str()) != 0 ?) And, if so, what should PyUnicode_Compare actually return if one of the parameters is a PyString? Maybe -1 for PyUnicode on the left, 1 for PyUnicode on the right? >> - make comparisons between PyString and PyBytes work (these are >> properly ordered) >> >> Is it just me, or do string/bytes comparisons already work? >>> s = str8('test') >>> b = b'test' >>> s == b True >>> b == s True >>> s != b False >>> b != s False Cheers, Tom From tom at vector-seven.com Tue Oct 9 15:39:59 2007 From: tom at vector-seven.com (Thomas Lee) Date: Tue, 09 Oct 2007 23:39:59 +1000 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470B82A4.8030703@vector-seven.com> References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> Message-ID: <470B84AF.4060704@vector-seven.com> Thomas Lee wrote: > Thomas Lee wrote: > >> Guido van Rossum wrote: >> >> >>> - make == and != between PyBytes and PyUnicode return False instead of >>> raising TypeError >>> >>> > A patch for this is ready. I'll submit it to the bug tracker later tonight. > This patch is now up: http://bugs.python.org/issue1249 Cheers, Tom From guido at python.org Tue Oct 9 17:01:02 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2007 08:01:02 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470B82A4.8030703@vector-seven.com> References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> Message-ID: On 10/9/07, Thomas Lee wrote: > Thomas Lee wrote: > > Guido van Rossum wrote: > > > >> - make == and != between PyBytes and PyUnicode return False instead of > >> raising TypeError > >> > A patch for this is ready. I'll submit it to the bug tracker later tonight. > >> - make == and != between PyString and Pyunicode return False instead > >> of converting > >> > This will be trivial, but I need to ask a stupid question: is this also > true for PyUnicode_Compare? (i.e. should PyUnicode_Compare(str8(), > str()) != 0 ?) > > And, if so, what should PyUnicode_Compare actually return if one of the > parameters is a PyString? Maybe -1 for PyUnicode on the left, 1 for > PyUnicode on the right? Assuming that PyUnicode_Compare is a three-way comparison (less, equal, more), it should raise a TypeError when one of the arguments is a PyString or PyBytes. > >> - make comparisons between PyString and PyBytes work (these are > >> properly ordered) > >> > >> > Is it just me, or do string/bytes comparisons already work? > > >>> s = str8('test') > >>> b = b'test' > >>> s == b > True > >>> b == s > True > >>> s != b > False > >>> b != s > False Seems it's already so. Do they order properly too? (< <= > >=) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Oct 9 17:56:50 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2007 08:56:50 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470BA418.5060301@vector-seven.com> References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> <470BA418.5060301@vector-seven.com> Message-ID: On 10/9/07, Thomas Lee wrote: > Guido van Rossum wrote: > >>> > >>>> - make == and != between PyBytes and PyUnicode return False instead of > >>>> raising TypeError > >>>> > >>>> > Just thinking about it I'm pretty sure my initial patch is wrong - > forgive my ignorance. To remove the ambiguity, is it fair to state the > following? > > bytes() == str() -> False instead of raising TypeError > bytes() != str() -> True instead of raising TypeError Correct. > I initially read that as "return False whenever any comparison between > bytes and unicode objects is attempted" ... The point is that a bytes and a str instance are never considered equal... > > Assuming that PyUnicode_Compare is a three-way comparison (less, > > equal, more), it should raise a TypeError when one of the arguments is > > a PyString or PyBytes. > > > > > Cool. Should have that sorted out soon. As above: > > str8() == str() -> False > str8() != str() -> True > > Correct? Well, in this case you actually have to compare the individual bytes. But yes. ;-) > >> Is it just me, or do string/bytes comparisons already work? > >> > >> >>> s = str8('test') > >> >>> b = b'test' > >> >>> s == b > >> True > >> >>> b == s > >> True > >> >>> s != b > >> False > >> >>> b != s > >> False > >> > > > > Seems it's already so. Do they order properly too? (< <= > >=) > > > Looks like it: > > >>> str8('a') > b'b' > False > >>> str8('a') < b'b' > True > >>> str8('a') <= b'b' > True > >>> str8('a') >= b'b' > False Well that part was easy then. ;-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Oct 9 19:02:03 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2007 10:02:03 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470BAA01.9090202@vector-seven.com> References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> <470BA418.5060301@vector-seven.com> <470BAA01.9090202@vector-seven.com> Message-ID: On 10/9/07, Thomas Lee wrote: > Guido van Rossum wrote: > > > > The point is that a bytes and a str instance are never considered equal... > > > > > Sorry. I understand now. My brain must have been on a holiday earlier. > :) Just pushed an updated patch to the bug tracker. > >> str8() == str() -> False > >> str8() != str() -> True > >> > >> Correct? > >> > > > > Well, in this case you actually have to compare the individual bytes. > > But yes. ;-) > > > I'm confused: if I'm making == and != between PyString return False > instead of converting, at what point would I need to be comparing bytes? > > The fix I have ready for this merely wipes out the conversion from > PyString to PyUnicode in PyUnicode_Compare and the existing code takes > care of the rest. Is this all that's required, or have I misinterpreted > this one too? :) Sorry, my bad. I misread and though you were talking about PyString vs. PyBytes. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Oct 9 19:24:33 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2007 10:24:33 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470BA418.5060301@vector-seven.com> References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> <470BA418.5060301@vector-seven.com> Message-ID: On 10/9/07, Thomas Lee wrote: > Looks like it: > > >>> str8('a') > b'b' > False > >>> str8('a') < b'b' > True > >>> str8('a') <= b'b' > True > >>> str8('a') >= b'b' > False Which reminds me of a task I forgot to add to the list: - change the constructor for PyString to match the one for PyBytes. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Oct 10 00:33:20 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2007 15:33:20 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <4709BA29.3060503@vector-seven.com> <470B82A4.8030703@vector-seven.com> <470BA418.5060301@vector-seven.com> Message-ID: On 10/9/07, Guido van Rossum wrote: > Which reminds me of a task I forgot to add to the list: > > - change the constructor for PyString to match the one for PyBytes. And another pair of forgotten tasks: - change PyBytes so that its str() is the same as its repr(). - change PyString so that its str() is the same as its repr(). The former seems easy. The latter might cause trouble (though then again, it may not). I should also note that I already submitted the changes to remove locale support from PyString, and am working on removing its encode() method. This is not going so smoothly. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Wed Oct 10 05:27:43 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 9 Oct 2007 23:27:43 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: On 10/8/07, Alexandre Vassalotti wrote: > On 10/8/07, Guido van Rossum wrote: > > - change indexing and iteration over PyString to return ints, not > > 1-char PyStrings > > I will try do this one. This took a bit longer than I expected. Changing the PyString iterator to return ints was easy, but I ran into some issues with the codec registry. I won't have the time this week to work on my patch any further. Meanwhile if someone would like to improve it, feel free to do so (the patch is attached to this email). Otherwise, I will continue to work on it next weekend. Cheers, -- Alexandre -------------- next part -------------- A non-text attachment was scrubbed... Name: string_iter_ret_ints.patch Type: text/x-diff Size: 4742 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20071009/7053a418/attachment-0001.patch From greg at krypto.org Wed Oct 10 07:49:00 2007 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 9 Oct 2007 22:49:00 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> Message-ID: <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> > > > - remove buffer API from PyUnicode > > > I'll take these two with a goal of having them done by the end of the > week. > > -gps > I should've known not to believe the simple description. This one is proving difficult by itself. If I modify the Unicode object to not support the buffer API I can't even launch the python interpreter. Any one with more time on their hands want this one? I'll still deal with adding the missing PyBytes methods. -g -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-3000/attachments/20071009/2da87f85/attachment.htm From jyasskin at gmail.com Wed Oct 10 08:02:19 2007 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Wed, 10 Oct 2007 01:02:19 -0500 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> Message-ID: <5d44f72f0710092302s52be427fp19bfbae07a8d2700@mail.gmail.com> On 10/10/07, Gregory P. Smith wrote: > > > > > > > > > > > > - remove buffer API from PyUnicode > > > > > > I'll take these two with a goal of having them done by the end of the > week. > > > > -gps > > I should've known not to believe the simple description. This one is > proving difficult by itself. If I modify the Unicode object to not support > the buffer API I can't even launch the python interpreter. Any one with > more time on their hands want this one? > > I'll still deal with adding the missing PyBytes methods. I've got two plane flights coming up, so I can tackle removing the buffer API from PyUnicode (and perhaps removing the PyBUF_CHARACTER constant entirely if it's on the way). I'll hope to be done by Monday, with a status report of some sort by Friday. From alexandre at peadrop.com Wed Oct 10 15:27:09 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Wed, 10 Oct 2007 09:27:09 -0400 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> Message-ID: On 10/10/07, Gregory P. Smith wrote: > > > - remove buffer API from PyUnicode > > > > I'll take these two with a goal of having them done by the end of the > week. > > > > I should've known not to believe the simple description. This one is > proving difficult by itself. If I modify the Unicode object to not support > the buffer API I can't even launch the python interpreter. Any one with > more time on their hands want this one? > I have a patch for this one. I just haven't tested it throughly. I attached the patch, so free to improve it. -- Alexandre -------------- next part -------------- A non-text attachment was scrubbed... Name: unicode_rm_buf_api.patch Type: text/x-diff Size: 1889 bytes Desc: not available Url : http://mail.python.org/pipermail/python-3000/attachments/20071010/e1ffa412/attachment.patch From lists at cheimes.de Wed Oct 10 20:01:21 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 10 Oct 2007 20:01:21 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: Message-ID: <470D1371.3020309@cheimes.de> Guido van Rossum wrote: > > The tasks I can think of are: [...] (Resend, the first mail didn't make it and I forgot a point) While I was working on a patch for the renaming of bytes and str8 I found some open issues that need to be discussed and addressed: - Create an iterator view for PyBytes. The buffer object doesn't have a view for iteration like bytes have with PyStringIter_Type. Guido said he wants a view to play nice with the Sequence ABC. - Should bytes (PyString_Type) subclass from basestring? It doesn't feel quite right to me. I think we could remove basestring completely if bytes doesn't subclass from it. - Do we need a common base type for bytes and buffer like e.g. basebytes? - The new bytes type (formally known as str8 / PyString_Type) still has a bunch of methods from its original Python 2.x parent: ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'] Should any of these methods be removed? - PyString still excepts unicode in a lot of places and some important parts of Python still require it. The interpreter was f... up as I removed unicode support from functions like PyString_Size and PyString_AsString. I'm not sure which function is causing trouble. The error message was an exception bootstrapping error because PyImport_ImportModule("__builtin__") failed. Should these methods still accept unicode and convert it with the default encoding? Christian From guido at python.org Wed Oct 10 20:08:20 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Oct 2007 11:08:20 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470D1371.3020309@cheimes.de> References: <470D1371.3020309@cheimes.de> Message-ID: On 10/10/07, Christian Heimes wrote: > Guido van Rossum wrote: > > > The tasks I can think of are: > [...] > > (Resend, the first mail didn't make it and I forgot a point) > > While I was working on a patch for the renaming of bytes and str8 I > found some open issues that need to be discussed and addressed: > > - Create an iterator view for PyBytes. The buffer object doesn't have a > view for iteration like bytes have with PyStringIter_Type. Guido said he > wants a view to play nice with the Sequence ABC. Right. Though it is a minor point and can be done later. > - Should bytes (PyString_Type) subclass from basestring? It doesn't feel > quite right to me. I think we could remove basestring completely if > bytes doesn't subclass from it. Definitely not. basestring is for text strings. We could even decide to remove it; we should instead have ABCs for this purpose. > - Do we need a common base type for bytes and buffer like e.g. basebytes? We can deal with that in abc.py as well, using virtual inheritance (the .register() method). > - The new bytes type (formally known as str8 / PyString_Type) still has You mean 'formerly', not 'formally' :-) I prefer to just call these by their C names (PyString) to be precise, as the C names aren't changing (at least not yet ;-). > a bunch of methods from its original Python 2.x parent: > > ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', > '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', > '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', > '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', > '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', > '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count', > 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', > 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', > 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', > 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', > 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', > 'upper', 'zfill'] > > Should any of these methods be removed? No, that's spelled out in the PEP. Those should all stay. (If you see a method that's not listed in the PEP, ask me about it before deleting it. :-) > - PyString still excepts unicode in a lot of places and some important > parts of Python still require it. The interpreter was f... up as I > removed unicode support from functions like PyString_Size and > PyString_AsString. I'm not sure which function is causing trouble. The > error message was an exception bootstrapping error because > PyImport_ImportModule("__builtin__") failed. Should these methods still > accept unicode and convert it with the default encoding? Several people have noted the same issue. My goal is to remove this behavior completely. I don't know how much it will take; these bootstrap issues are always hard to debug and sometimes hard to fix. I am looking into this a bit right now; I suspect it's got to do with some types that still return a PyString from their repr(). I noticed that even removing .encode() from PyString breaks about 5 tests. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From brett at python.org Wed Oct 10 20:10:43 2007 From: brett at python.org (Brett Cannon) Date: Wed, 10 Oct 2007 11:10:43 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <52dc1c820710080932l49954bb1u82d31d195ed18593@mail.gmail.com> <52dc1c820710092249h3e866cf4y8fb88b1cc6f31361@mail.gmail.com> Message-ID: On 10/10/07, Alexandre Vassalotti wrote: > On 10/10/07, Gregory P. Smith wrote: > > > > - remove buffer API from PyUnicode > > > > > > I'll take these two with a goal of having them done by the end of the > > week. > > > > > > > I should've known not to believe the simple description. This one is > > proving difficult by itself. If I modify the Unicode object to not support > > the buffer API I can't even launch the python interpreter. Any one with > > more time on their hands want this one? > > > > I have a patch for this one. I just haven't tested it throughly. > I attached the patch, so free to improve it. It's best to toss all patches up on the issue tracker as then they don't get lost amongst the other emails in the mailing list. Plus it provides a more centralized history of what happens with the code and lets anyone searching for work on this exact topic have another place to find it. -Brett From brett at python.org Wed Oct 10 20:12:50 2007 From: brett at python.org (Brett Cannon) Date: Wed, 10 Oct 2007 11:12:50 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470D1371.3020309@cheimes.de> References: <470D1371.3020309@cheimes.de> Message-ID: On 10/10/07, Christian Heimes wrote: > Guido van Rossum wrote: > > > The tasks I can think of are: > [...] > > (Resend, the first mail didn't make it and I forgot a point) > > While I was working on a patch for the renaming of bytes and str8 I > found some open issues that need to be discussed and addressed: > > - Create an iterator view for PyBytes. The buffer object doesn't have a > view for iteration like bytes have with PyStringIter_Type. Guido said he > wants a view to play nice with the Sequence ABC. > > - Should bytes (PyString_Type) subclass from basestring? It doesn't feel > quite right to me. I think we could remove basestring completely if > bytes doesn't subclass from it. > > - Do we need a common base type for bytes and buffer like e.g. basebytes? > > - The new bytes type (formally known as str8 / PyString_Type) still has > a bunch of methods from its original Python 2.x parent: > > ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', > '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', > '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', > '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', > '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', > '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count', > 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', > 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', > 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', > 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', > 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', > 'upper', 'zfill'] > > Should any of these methods be removed? > See PEP 3137; http://www.python.org/dev/peps/pep-3137/#methods . -Brett From lists at cheimes.de Wed Oct 10 21:08:27 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 10 Oct 2007 21:08:27 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <470D1371.3020309@cheimes.de> Message-ID: <470D232B.80607@cheimes.de> Guido van Rossum wrote: > Definitely not. basestring is for text strings. We could even decide > to remove it; we should instead have ABCs for this purpose. I'm going to provide a patch which rips basestring out, k? Somebody has to write a fixer for 2to3 which replaces code like isinstance(egg, basestring) with isinstance(egg, str). > You mean 'formerly', not 'formally' :-) I prefer to just call these by > their C names (PyString) to be precise, as the C names aren't changing > (at least not yet ;-). Oh, formerly ... right. The current state of the names is very confusing. It's going to cost me some cups of coffee. str - PyUnicode bytes - PyString buffer - PyBytes > No, that's spelled out in the PEP. Those should all stay. (If you see > a method that's not listed in the PEP, ask me about it before deleting > it. :-) Doh, I should have read the PEP again before asking the question. I've a question about one point. The PEP states "They accept anything that implements the PEP 3118 buffer API for bytes arguments, and return the same type as the object whose method is called ("self")". Which types do implement the buffer API? PyString, PyBytes but not PyUnicode? For now the PyString takes PyUnicode objects are argument and vice versa but PyBytes doesn't take unicode. Do I understand correctly that PyString must not accept PyUnicode? >>> b"abc".count("b") 1 >>> "abc".count(b"b") 1 >> buffer(b"abc").count("b") Traceback (most recent call last): File "", line 1, in SystemError: can't use str as char buffer >>> buffer(b"abc").count(b"b") 1 > Several people have noted the same issue. My goal is to remove this > behavior completely. I don't know how much it will take; these > bootstrap issues are always hard to debug and sometimes hard to fix. I tried to debug and fix it but I gave up after half an hour. > I am looking into this a bit right now; I suspect it's got to do with > some types that still return a PyString from their repr(). I noticed > that even removing .encode() from PyString breaks about 5 tests. Great! I've a patch that renames PyString -> bytes and PyByte -> buffer while keeping str8 as an alias for bytes until str8 is removed. It's based on Alexandres patch which itself is partly based on my patch. It breaks a hell of a lot but it could give you a head start. >>> b'' b'' >>> type(b'') >>> type(b'') is str8 True >>> type(b'') is bytes True >>> type(buffer(b'')) I'll keep working on the patch. Crys From g.brandl at gmx.net Wed Oct 10 21:33:24 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 10 Oct 2007 21:33:24 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470D232B.80607@cheimes.de> References: <470D1371.3020309@cheimes.de> <470D232B.80607@cheimes.de> Message-ID: Christian Heimes schrieb: >> You mean 'formerly', not 'formally' :-) I prefer to just call these by >> their C names (PyString) to be precise, as the C names aren't changing >> (at least not yet ;-). > > Oh, formerly ... right. The current state of the names is very > confusing. It's going to cost me some cups of coffee. > > str - PyUnicode > bytes - PyString > buffer - PyBytes I agree that this is quite confusing. The PyBytes functions can be changed without a thought since they aren't 2.x heritage. Since PyBuffer_* is already taken, what about a PyByteBuffer_ prefix? PyString_ could then be renamed to PyByteString_. PyUnicode might be allowed to stay... Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From lists at cheimes.de Wed Oct 10 21:58:19 2007 From: lists at cheimes.de (Christian Heimes) Date: Wed, 10 Oct 2007 21:58:19 +0200 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <470D1371.3020309@cheimes.de> <470D232B.80607@cheimes.de> Message-ID: Georg Brandl wrote: > I agree that this is quite confusing. The PyBytes functions can be changed > without a thought since they aren't 2.x heritage. Since PyBuffer_* is already > taken, what about a PyByteBuffer_ prefix? PyString_ could then be renamed > to PyByteString_. PyUnicode might be allowed to stay... I like your idea! IMHO PyUnicode_ can stay. It reflects the intention and aim of the type and it's easy to remember. str() contains unicode data and it's C name is PyUnicode. That works for me. *g* For the other two names I find PyBytes_ for bytes() and PyBytesBuffer_ for buffer() easier to remember and more consistent. Christian From brett at python.org Wed Oct 10 22:30:36 2007 From: brett at python.org (Brett Cannon) Date: Wed, 10 Oct 2007 13:30:36 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <470D1371.3020309@cheimes.de> <470D232B.80607@cheimes.de> Message-ID: On 10/10/07, Christian Heimes wrote: > Georg Brandl wrote: > > I agree that this is quite confusing. The PyBytes functions can be changed > > without a thought since they aren't 2.x heritage. Since PyBuffer_* is already > > taken, what about a PyByteBuffer_ prefix? PyString_ could then be renamed > > to PyByteString_. PyUnicode might be allowed to stay... > > I like your idea! > > IMHO PyUnicode_ can stay. It reflects the intention and aim of the type > and it's easy to remember. str() contains unicode data and it's C name > is PyUnicode. That works for me. *g* > > For the other two names I find PyBytes_ for bytes() and PyBytesBuffer_ > for buffer() easier to remember and more consistent. +1 from me. No need to have PyBytes_ be PyBytesString_ as the string tie-in will become historical. Plus PyBytes_ is shorter without losing any detail of what the functions work with. -Brett From guido at python.org Wed Oct 10 23:00:26 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Oct 2007 14:00:26 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: References: <470D1371.3020309@cheimes.de> <470D232B.80607@cheimes.de> Message-ID: It's all fine to debate new names, but for 3.0a2, the existing C-level names will be used. Period. I am not going to review a change that touches every other line of code to do such a big rename. FWIW, I think the new names should be different from any existing names, otherwise merges from the trunk will be too much of a pain (and ditto for ports of 3rd party code). --Guido On 10/10/07, Brett Cannon wrote: > On 10/10/07, Christian Heimes wrote: > > Georg Brandl wrote: > > > I agree that this is quite confusing. The PyBytes functions can be changed > > > without a thought since they aren't 2.x heritage. Since PyBuffer_* is already > > > taken, what about a PyByteBuffer_ prefix? PyString_ could then be renamed > > > to PyByteString_. PyUnicode might be allowed to stay... > > > > I like your idea! > > > > IMHO PyUnicode_ can stay. It reflects the intention and aim of the type > > and it's easy to remember. str() contains unicode data and it's C name > > is PyUnicode. That works for me. *g* > > > > For the other two names I find PyBytes_ for bytes() and PyBytesBuffer_ > > for buffer() easier to remember and more consistent. > > +1 from me. No need to have PyBytes_ be PyBytesString_ as the string > tie-in will become historical. Plus PyBytes_ is shorter without > losing any detail of what the functions work with. > > -Brett > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Oct 10 23:06:33 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Oct 2007 14:06:33 -0700 Subject: [Python-3000] PEP 3137 plan of attack In-Reply-To: <470D232B.80607@cheimes.de> References: