From tjreedy at udel.edu Thu Sep 1 00:02:53 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 31 Aug 2011 18:02:53 -0400 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 8/31/2011 1:10 PM, Guido van Rossum wrote: > This is why I find the issue of Python, the language (and stdlib), as > a whole "conforming to the Unicode standard" such a troublesome > concept -- I think it is something that an application may claim, but > the language should make much more modest claims, such as "the regular > expression syntax supports features X, Y and Z from the Unicode > recommendation XXX, or "the UTF-8 codec will never emit a sequence of > bytes that is invalid according Unicode specification YYY". (As long > as the Unicode references are also versioned or dated.) This will be a great improvement. It was both embarrassing and frustrating to have to respond to Tom C.'s (and other's) issue with "Our unicode type is too vaguely documented to tell whether you are reporting a bug or making a feature request. > But if you can observe (valid) surrogate pairs it is still UTF-16. ... > Ok, I dig this, to some extent. However saying it is UCS-2 is equally > bad. As I said on the tracker, our narrow builds are in-between (while moving closer to UTF-16), and both terms are deceptive, at least to some. > At the same time I think it would be useful if certain string > operations like .lower() worked in such a way that *if* the input were > valid UTF-16, *then* the output would also be, while *if* the input > contained an invalid surrogate, the result would simply be something > that is no worse (in particular, those are all mapped to themselves). > We could even go further and have .lower() and friends look at > graphemes (multi-code-point characters) if the Unicode std has a > useful definition of e.g. lowercasing graphemes that differed from > lowercasing code points. > > An analogy is actually found in .lower() on 8-bit strings in Python 2: > it assumes the string contains ASCII, and non-ASCII characters are > mapped to themselves. If your string contains Latin-1 or EBCDIC or > UTF-8 it will not do the right thing. But that doesn't mean strings > cannot contain those encodings, it just means that the .lower() method > is not useful if they do. (Why ASCII? Because that is the system > encoding in Python 2.) Good analogy. > Let's call those things graphemes (Tom C's term, I quite like leaving > "character" ambiguous) -- they are sequences of multiple code points > that represent a single "visual squiggle" (the kind of thing that > you'd want to be swappable in vim with "xp" :-). I agree that APIs are > needed to manipulate (match, generate, validate, mutilate, etc.) > things at the grapheme level. I don't agree that this means a separate > data type is required. I presume by 'separate data type' you mean a base level builtin class like int or str and that you would allow for wrapper classes built on top of str, as such are not really 'separate'. For grapheme leval and higher, we should certainly start with wrappers and probably with alternate versions based on different strategies. > There are ever-larger units of information > encoded in text strings, with ever farther-reaching (and more vague) > requirements on valid sequences. Do you want to have a data type that > can represent (only valid) words in a language? Sentences? Novels? ... > I think that at this point in time the best we can do is claim that > Python (the language standard) uses either 16-bit code units or 21-bit > code points in its string datatype, and that, thanks to PEP 393, > CPython 3.3 and further will always use 21-bit code points (but Jython > and IronPython may forever use their platform's native 16-bit code > unit representing string type). And then we add APIs that can be used > everywhere to look for code points (even if the string contains code > points), graphemes, or larger constructs. I'd like those APIs to be > designed using a garbage-in-garbage-out principle, where if the input > conforms to some Unicode requirement, the output does too, but if the > input doesn't, the output does what makes most sense. Validation is > then limited to codecs, and optional calls. > > If you index or slice a string, or create a string from chr() of a > surrogate or from some other value that the Unicode standard considers > an illegal code point, you better know what you are doing. I want > chr(i) to be valid for all values of i in range(2**21), Actually, it is range(0X110000) == range(1114112) so that UTF-8 uses at most 4 bytes per codepoint. 21 bits is 20.1 bits rounded up. > so it can be > used to create a lone surrogate, or (on systems with 16-bit > "characters") a surrogate pair. And also ord(chr(i)) == i for all i in > range(2**21). for i in range(0x110000): # 1114112 if ord(chr(i)) != i: print(i) # prints nothing (on Windows) > I'm not sure about ord() on a 2-character string > containing a surrogate pair on systems where strings contain 21-bit > code points; I think it should be an error there, just as ord() on > other strings of length != 1. But on systems with 16-bit "characters", > ord() of strings of length 2 containing a valid surrogate pair should > work. And now does, thanks to whoever fixed this (withing the last year, I think). -- Terry Jan Reedy From ncoghlan at gmail.com Thu Sep 1 00:44:59 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 1 Sep 2011 08:44:59 +1000 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Sep 1, 2011 at 8:02 AM, Terry Reedy wrote: > On 8/31/2011 1:10 PM, Guido van Rossum wrote: >> Ok, I dig this, to some extent. However saying it is UCS-2 is equally >> bad. > > As I said on the tracker, our narrow builds are in-between (while moving > closer to UTF-16), and both terms are deceptive, at least to some. We should probably just explicitly document that the internal representation in narrow builds is a UCS-2/UTF-16 hybrid - like UTF-16, it can handle the full code point space, but, like UCS-2, it allows code unit sequences (such as lone surrogates) that strict UTF-16 would reject. Perhaps we should also finally split strings out to a dedicated section on the same tier as Sequence types in the library reference. Yes, they're sequences, but they're also so much more than that (try as you might, you're unlikely to be successful in ducktyping strings the way you can sequences, mappings, files, numbers and other interfaces. Needing a "real string" is even more common than needing a "real dict", especially after the efforts to make most parts of the interpreter that previously cared about the latter distinction accept arbitrary mapping objects). I've created http://bugs.python.org/issue12874, suggesting that the "Sequence Types" and "memoryview type" sections could be usefully rearranged as: Sequence Types - list, tuple, range Text Data - str Binary Data - bytes, bytearray, memoryview Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Thu Sep 1 01:49:18 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 1 Sep 2011 09:49:18 +1000 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <20110829231420.20c3516a@pitrou.net> <20110830025510.638b41d9@pitrou.net> Message-ID: On Thu, Sep 1, 2011 at 3:28 AM, Guido van Rossum wrote: > On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro > Cesare, I'm really sorry that you became so disillusioned that you > abandoned wordcode. I agree that we were too optimistic about Unladen > Swallow. Also that the existence of PyPy and its PR machine (:-) > should not stop us from improving CPython. Yep, and I'll try to do a better job of discouraging creeping complexity (without adequate payoffs) without the harmful side effect of discouraging experimentation with CPython performance improvements in general. It's massive "rewrite the world" changes, that don't adequately account for all the ways CPython gets used or the fact that core devs need to be able to effectively *review* the changes, that are unlikely to ever get anywhere. More localised changes, or those that are relatively easy to explain have a much better chance. So I'll switch my tone to just trying to make sure that portability and maintainability concerns are given due weight :) Cheers, Nick. P.S. I suspect a big part of my attitude stems from the fact that we're still trying to untangle some of the consequences of committing the PEP 3118 new buffer API implementation with inadequate review (it turns out the implementation didn't reflect the PEP and the PEP had deficiencies of its own), and I was one of the ones advocating in favour of that patch. Once bitten, twice shy, etc. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From nyamatongwe at gmail.com Thu Sep 1 02:58:57 2011 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Thu, 1 Sep 2011 10:58:57 +1000 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <4E5E8811.90600@g.nevcal.com> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E82B0.4020302@g.nevcal.com> <4E5E8811.90600@g.nevcal.com> Message-ID: Glenn Linderman: > That said, regexp, or some sort of cursor on a string, might be a workable > solution.? Will it have adequate performance?? Perhaps, at least for some > applications.? Will it be as conceptually simple as indexing an array of > graphemes?? No.? Will it ever reach the efficiency of indexing an array of > graphemes? No.? Does that matter? Depends on the application. Using an iterator for cluster access is a common technique currently. For example, with the Pango text layout and drawing library, you may create a PangoLayoutIter over a text layout object (which contains a UTF-8 string along with formatting information) and iterate by clusters by calling pango_layout_iter_next_cluster. Direct access to clusters by index is not as useful in this domain as access by pixel positions - for example to examine the portion of a layout visible in a window. http://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-get-iter In this API, 'index' is used to refer to a byte index into UTF-8, not a character or cluster index. Rather than discuss functionality in the abstract, we need some use cases involving different levels of character and cluster access to see whether providing indexed access is worthwhile. I'll start with an example: some text drawing engines draw decomposed characters ("o" followed by " ?" -> "o?") differently compared to their composite equivalents ("?") and this may be perceived as better or worse. I'd like to offer an option to replace some decomposed characters with their composite equivalent before drawing but since other characters may look worse, I don't want to do a full normalization. The API style that appears most useful for this example is an iterator over the input string that yields composed and decomposed character strings (that is, it will yield both "?" and "o?"), each character string is then converted if in a substitution dictionary and written to an output string. This is similar to an iterator over grapheme clusters although, since it is only aimed at composing sequences, the iterator could be simpler than a full grapheme cluster iterator. One of the benefits of iterator access to text is that many different iterators can be built without burdening the implementation object with extra memory costs as would be likely with techniques that build indexes into the representation. Neil From guido at python.org Thu Sep 1 03:11:28 2011 From: guido at python.org (Guido van Rossum) Date: Wed, 31 Aug 2011 18:11:28 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E82B0.4020302@g.nevcal.com> <4E5E8811.90600@g.nevcal.com> Message-ID: On Wed, Aug 31, 2011 at 5:58 PM, Neil Hodgson wrote: > [...] some text drawing engines draw decomposed characters ("o" > followed by " ?" -> "o?") differently compared to their composite > equivalents ("?") and this may be perceived as better or worse. I'd > like to offer an option to replace some decomposed characters with > their composite equivalent before drawing but since other characters > may look worse, I don't want to do a full normalization. Isn't this an issue properly solved by various normal forms? -- --Guido van Rossum (python.org/~guido) From hagen at zhuliguan.net Thu Sep 1 03:27:28 2011 From: hagen at zhuliguan.net (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=) Date: Wed, 31 Aug 2011 21:27:28 -0400 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E82B0.4020302@g.nevcal.com> <4E5E8811.90600@g.nevcal.com> Message-ID: >> [...] some text drawing engines draw decomposed characters ("o" >> followed by " ?" -> "o?") differently compared to their composite >> equivalents ("?") and this may be perceived as better or worse. I'd >> like to offer an option to replace some decomposed characters with >> their composite equivalent before drawing but since other characters >> may look worse, I don't want to do a full normalization. > > Isn't this an issue properly solved by various normal forms? I think he's rather describing the need for custom "abnormal forms". - Hagen From nyamatongwe at gmail.com Thu Sep 1 03:29:39 2011 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Thu, 1 Sep 2011 11:29:39 +1000 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E82B0.4020302@g.nevcal.com> <4E5E8811.90600@g.nevcal.com> Message-ID: Guido van Rossum: > On Wed, Aug 31, 2011 at 5:58 PM, Neil Hodgson wrote: >> [...] some text drawing engines draw decomposed characters ("o" >> followed by " ?" -> "o?") differently compared to their composite >> equivalents ("?") and this may be perceived as better or worse. I'd >> like to offer an option to replace some decomposed characters with >> their composite equivalent before drawing but since other characters >> may look worse, I don't want to do a full normalization. > > Isn't this an issue properly solved by various normal forms? No, since normalization of all cases may actually lead to worse visuals in some situations. A potential reason for drawing decomposed characters differently is that more room may be allocated for the generic condition where a character may be combined with a wide variety of accents compared with combining it with a specific accent. Here is an example on Windows drawing composite and decomposed forms to show the types of difference often encountered. http://scintilla.org/Composite.png Now, this particular example displays both forms quite reasonably so would not justify special processing but I have seen on other platforms and earlier versions of Windows where the umlaut in the decomposed form is displaced to the right even to the extent of disappearing under the next character. In the example, the decomposed 'o' is shorter and lighter and the umlauts are round instead of square. Neil From guido at python.org Thu Sep 1 04:51:35 2011 From: guido at python.org (Guido van Rossum) Date: Wed, 31 Aug 2011 19:51:35 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E82B0.4020302@g.nevcal.com> <4E5E8811.90600@g.nevcal.com> Message-ID: On Wed, Aug 31, 2011 at 6:29 PM, Neil Hodgson wrote: > Guido van Rossum: > >> On Wed, Aug 31, 2011 at 5:58 PM, Neil Hodgson wrote: >>> [...] some text drawing engines draw decomposed characters ("o" >>> followed by " ?" -> "o?") differently compared to their composite >>> equivalents ("?") and this may be perceived as better or worse. I'd >>> like to offer an option to replace some decomposed characters with >>> their composite equivalent before drawing but since other characters >>> may look worse, I don't want to do a full normalization. >> >> Isn't this an issue properly solved by various normal forms? > > ? No, since normalization of all cases may actually lead to worse > visuals in some situations. A potential reason for drawing decomposed > characters differently is that more room may be allocated for the > generic condition where a character may be combined with a wide > variety of accents compared with combining it with a specific accent. Ok, I thought there was also a form normalized (denormalized?) to decomposed form. But I'll take your word. > ? Here is an example on Windows drawing composite and decomposed > forms to show the types of difference often encountered. > http://scintilla.org/Composite.png > ? Now, this particular example displays both forms quite reasonably > so would not justify special processing but I have seen on other > platforms and earlier versions of Windows where the umlaut in the > decomposed form is displaced to the right even to the extent of > disappearing under the next character. In the example, the decomposed > 'o' is shorter and lighter and the umlauts are round instead of > square. I'm not sure it's a good idea to try and improve on the font using such a hack. But I won't deny you have the right. :-) -- --Guido van Rossum (python.org/~guido) From v+python at g.nevcal.com Thu Sep 1 06:40:55 2011 From: v+python at g.nevcal.com (Glenn Linderman) Date: Wed, 31 Aug 2011 21:40:55 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E82B0.4020302@g.nevcal.com> <4E5E8811.90600@g.nevcal.com> Message-ID: <4E5F0CD7.3030509@g.nevcal.com> On 8/31/2011 5:58 PM, Neil Hodgson wrote: > Glenn Linderman: > >> That said, regexp, or some sort of cursor on a string, might be a workable >> solution. Will it have adequate performance? Perhaps, at least for some >> applications. Will it be as conceptually simple as indexing an array of >> graphemes? No. Will it ever reach the efficiency of indexing an array of >> graphemes? No. Does that matter? Depends on the application. > Using an iterator for cluster access is a common technique > currently. For example, with the Pango text layout and drawing > library, you may create a PangoLayoutIter over a text layout object > (which contains a UTF-8 string along with formatting information) and > iterate by clusters by calling pango_layout_iter_next_cluster. Direct > access to clusters by index is not as useful in this domain as access > by pixel positions - for example to examine the portion of a layout > visible in a window. > > http://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-get-iter > In this API, 'index' is used to refer to a byte index into UTF-8, > not a character or cluster index. I agree that different applications may have different needs for different types of indexes to various starting points in a large string. Where a custom index is required, a standard index may not be needed. > One of the benefits of iterator access to text is that many > different iterators can be built without burdening the implementation > object with extra memory costs as would be likely with techniques that > build indexes into the representation. How many different iterators into the same text would be concurrently needed by an application? And why? Seems like if it is dealing with text at the level of grapheme clusters, it needs that type of iterator. Of course, if it does I/O it needs codec access, but that is by nature sequential from the starting point to the end point. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Thu Sep 1 09:13:03 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 01 Sep 2011 16:13:03 +0900 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E537EEC.1070602@v.loewis.de> <1314099542.3485.10.camel@localhost.localdomain> <4E53945E.1050102@v.loewis.de> <1314101745.3485.18.camel@localhost.localdomain> <4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com> <87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp> <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> Where I cut your words, we are in 100% agreement. (FWIW :-) Guido van Rossum writes: > On Tue, Aug 30, 2011 at 11:03 PM, Stephen J. Turnbull > wrote: > > Well, that's why I wrote "intended to be suggestive". ?The Unicode > > Standard does not specify at all what the internal representation of > > characters may be, it only specifies what their external behavior must > > be when two processes communicate. ?(For "process" as used in the > > standard, think "Python modules" here, since we are concerned with the > > problems of folks who develop in Python.) ?When observing the behavior > > of a Unicode process, there are no UTF-16 arrays or UTF-8 arrays or > > even UTF-32 arrays; only arrays of characters. > > Hm, that's not how I would read "process". IMO that is an > intentionally vague term, I agree. I'm sorry that I didn't make myself clear. The reason I read "process" as "module" is that some modules of Python, and therefore Python as a whole, cannot conform to the Unicode standard. Eg, anything that inputs or outputs bytes. Therefore only "modules" and "types" can be asked to conform. (I don't think it makes sense to ask anything lower level to conform. See below where I comment on your .lower() example.) What I am advocating (for the long term) is provision of *one* module (or type) such that if the text processing done by the application is done entirely in terms of this module (type), it will conform (to some specified degree, chosen to balance user wants with implementation and support costs). It may be desireable to provide others for sufficiently important particular use cases, but at present I see a clear need for *one*. Unicode conformance is going to be a common requirement for apps used by global enterprises. I oppose trying to make str into that type. We need str, just as it is, for many reasons. > and we are free to decide how to interpret it. I don't think it > will work very well to define a process as a Python module; what > about Python modules that agree about passing along array of code > units (or streams of UTF-8, for that matter)? Certainly a group of cooperating modules could form a conforming process, just as you describe it for one example. The "one module" mentioned above need not implement everything internally, but it would take responsiblity for providing guarantees (eg, unit tests) of whatever conformance claims it makes. > > Thus, according to the rules of handling a UTF-16 stream, it is an > > error to observe a lone surrogate or a surrogate pair that isn't a > > high-low pair (Unicode 6.0, Ch. 3 "Conformance", requirements C1 and > > C8-C10). ?That's what I mean by "can't tell it's UTF-16". > > But if you can observe (valid) surrogate pairs it is still UTF-16. In the concrete implementation I have in mind, surrogate pairs are represented by a str containing 2 code units. But in that case s[i][1] is an error, and s[i][0] == s[i]. print(s[i][0]) and print(s[i]) will print the same character to the screen. If you decode it to bytes, well, it's not a str any more so what have you proved? Ie, what you will see is *code points* not in the BMP. You don't have to agree that such "surrogate containment" behavior is so valuable as I think it is, but that's what I have in mind as one requirement for a "conforming implementation of UTF-16". > At the same time I think it would be useful if certain string > operations like .lower() worked in such a way that *if* the input were > valid UTF-16, *then* the output would also be, while *if* the input > contained an invalid surrogate, the result would simply be something > that is no worse (in particular, those are all mapped to > themselves). I don't think that it's a good idea to go for conformance at the method level. It would be a feature for apps that don't claim full conformance because they nevertheless give good results in more cases. The downside will be Python apps using str that will pass conformance tests written for, say Western Europe, but end users in Kuwait and Kuala Lumpur will report bugs. > An analogy is actually found in .lower() on 8-bit strings in Python 2: > it assumes the string contains ASCII, and non-ASCII characters are > mapped to themselves. If your string contains Latin-1 or EBCDIC or > UTF-8 it will not do the right thing. But that doesn't mean strings > cannot contain those encodings, it just means that the .lower() method > is not useful if they do. (Why ASCII? Because that is the system > encoding in Python 2.) Sure. I think that approach is fine for str, too, except that I would hope it looks up BMP base characters in the case-mapping database. The fact is that with very few exceptions non-BMP characters are going to be symbols (mathematical operators and emoticons, for example). This is good enough, except when it's not---but when it's not, only 100% conformance is really a reasonable target. IMO, of course. > I think we should just document how it behaves and not get hung up on > what it is called. Mentioning UTF-16 If you also say, "this type can represent all characters in Unicode, as well as certain non-characters", why mention UTF-16 at all? > Let's call those things graphemes (Tom C's term, I quite like leaving > "character" ambiguous) OK, but those definitions need to be made clear, as "grapheme cluster" and "combined character" are defined in the Unicode standard, and in fact mean slightly different things from each other. > -- they are sequences of multiple code points that represent a > single "visual squiggle" (the kind of thing that you'd want to be > swappable in vim with "xp" :-). I agree that APIs are needed to > manipulate (match, generate, validate, mutilate, etc.) things at > the grapheme level. I don't agree that this means a separate data > type is required. Clear enough. > There are ever-larger units of information encoded in text strings, > with ever farther-reaching (and more vague) requirements on valid > sequences. Do you want to have a data type that can represent (only > valid) words in a language? Sentences? Novels? No, and I can tell you why! The difference between characters and words is much more important than that between code point and grapheme cluster for most users and the developers who serve them. Even small children recognize typographical ligatures as being composite objects, while at least this Spanish-as-a-second-language learner was taught that `?' is an atomic character represented by a discontiguous glyph, like `i', and it is no more related to `n' than `m' is. Users really believe that characters are atomic. Even in the cases of Han characters and Hangul, users think of the characters as being "atomic," but in the sense of Bohr rather than that of Democritus. I think the situation for text processing is analogous to chemistry where the atom, with a few fairly gross properties (the outer electron orbitals) is the fundamental unit, not the elementary particles like electrons and protons and structures like inner orbitals. Sure, there are higher order structures like molecules, phases, and crystals, but it is elements that have the most regular and simply described behavior for the chemist, and it does not become any simpler for the chemist if you decompose the atom. The composed character or grapheme cluster is the analogue of the atom for most processing at the level of "text". The only real exceptions I can imagine are in the domain of linguistics. > I think that at this point in time the best we can do is claim that > Python (the language standard) uses either 16-bit code units or 21-bit > code points in its string datatype, and that, thanks to PEP 393, > CPython 3.3 and further will always use 21-bit code points (but Jython > and IronPython may forever use their platform's native 16-bit code > unit representing string type). And then we add APIs that can be used > everywhere to look for code points (even if the string contains code > points), graphemes, or larger constructs. I'd like those APIs to be > designed using a garbage-in-garbage-out principle, where if the input > conforms to some Unicode requirement, the output does too, but if the > input doesn't, the output does what makes most sense. Validation is > then limited to codecs, and optional calls. Clear enough. I disagree that that will be enough for constructing large-scale Unicode-conformant applications. Somebody is going to have to produce batteries for those applications, and I think they should be included in Python. I agree that it's proper that I and those who think the same way take responsibility for writing and implementing a PEP. > If you index or slice a string, or create a string from chr() of a > surrogate or from some other value that the Unicode standard considers > an illegal code point, you better know what you are doing. I think that's like asking a toddler to know that the stove is hot. The consequences for the toddler of her ignorance are much greater, but the informational requirement is equally stringent. Of course application writers are adults who could be asked to learn, but economically I think it make a lot more sense to include those batteries. IMHO YMMV, obviously. > I want chr(i) to be valid for all values of i in range(2**21), I quite agree (ie, for str). Thus I perceive a need for another type. From stephen at xemacs.org Thu Sep 1 09:59:19 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 01 Sep 2011 16:59:19 +0900 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <4E5E882C.1050006@g.nevcal.com> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E882C.1050006@g.nevcal.com> Message-ID: <87pqjkk814.fsf@uwakimon.sk.tsukuba.ac.jp> Glenn Linderman writes: > We can either artificially constrain ourselves to minor tweaks of > the legal conforming bytestreams, It's not artificial. Having the internal representation be the same as a standard encoding is very useful for a large number of minor usages (urgently saving buffers in a text editor that knows its internal state is inconsistent, viewing strings in the debugger, PEP 393-style space optimization is simpler if text properties are out-of-band, etc). > or we can invent a representation (whether called str or something > else) that is useful and efficient in practice. Bring on the practice, then. You say that a bit to identify lone surrogates might be useful or efficient. In what application? How much time or space does it save? You say that a bit to cache a property might be useful or efficient. In what application? Which properties? Are those properties a set fixed by the language, or would some bits be available for application-specific property caching? How much time or space does that save? What are the costs to applications that don't want the cache? How is the bit-cache affected by PEP 393? I know of no answers (none!) to those questions that favor introduction of a bit-cache representation now. And those bits aren't going anywhere; it will always be possible to use a "wide" build and change the representation later, if the optimization is valuable enough. Now, I'm aware that my experience is limited to the implementations of one general-purpose language (Emacs Lisp) of retricted applicability. But its primary use *is* in text processing, so I'm moderately expert. *Moderately*. Always interested in learning more, though. If you know of relevant use cases, I'm listening! Even if Guido doesn't find them convincing for Python, we might find them interesting at XEmacs. From nyamatongwe at gmail.com Thu Sep 1 10:05:41 2011 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Thu, 1 Sep 2011 18:05:41 +1000 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <4E5F0CD7.3030509@g.nevcal.com> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E82B0.4020302@g.nevcal.com> <4E5E8811.90600@g.nevcal.com> <4E5F0CD7.3030509@g.nevcal.com> Message-ID: Glenn Linderman: > How many different iterators into the same text would be concurrently needed > by an application?? And why??Seems like if it is dealing with text at the > level of grapheme clusters, it needs that type of iterator.? Of course, if > it does I/O it needs codec access, but that is by nature sequential from the > starting point to the end point. I would expect that there would mostly be a single iterator into a string but can imagine scenarios in which multiple iterators may be concurrently active and that these could be of different types. For example, say we wanted to search for each code point in a text that fails some test (such as being a member of a set of unwanted vowel diacritics) and then display that failure in context with its surrounding text of up to 30 graphemes either side. Neil From turnbull at sk.tsukuba.ac.jp Thu Sep 1 10:33:50 2011 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 01 Sep 2011 17:33:50 +0900 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <4E5E8840.4080600@g.nevcal.com> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <87vctdkbzh.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5E8840.4080600@g.nevcal.com> Message-ID: <87obz4k6fl.fsf@uwakimon.sk.tsukuba.ac.jp> Glenn Linderman writes: > I found your discussion of streams versus arrays, as separate concepts > related to Unicode, along with Terry's bisect indexing implementation, > to rather inspiring. Just because Unicode defines streams of codeunits > of various sizes (UTF-8, UTF-16, UTF-32) to represent characters when > processes communicate and for storage (which is one way processes > communicate), that doesn't imply that the internal representation of > character strings in a programming language must use exactly that > representation. That is true, and Unicode is *very* careful to define its requirements so that is true. That doesn't mean using an alternative representation is an improvement, though. > I'm unaware of any current Python implementation that has chosen to > use UTF-8 as the internal representation of character strings (I'm > also aware Perl has made that choice), yet UTF-8 is one of the > commonly recommend character representations on the Linux platform, > from what I read. There are two reasons for that. First, widechar representations are right out for anything related to the file system or OS, unless you are prepared to translate before passing to the OS. If you use UTF-8, then asking the user to use a UTF-8 locale to communicate with your app is a plausible way to eliminate any translation in your app. (The original moniker for UTF-8 was UTF-FSS, where FSS stands for "file system safe.") Second, much text processing is stream-oriented and one-pass. In those cases, the variable-width nature of UTF-8 doesn't cost you anything. Eg, this is why the common GUIs for Unix (X.org, GTK+, and Qt) either provide or require UTF-8 coding for their text. It costs *them* nothing and is file-system-safe. > So in that sense, Python has rejected the idea of using the > "native" or "OS configured" representation as its internal > representation. I can't agree with that characterization. POSIX defines the concept of *locale* precisely because the "native" representation of text in Unix is ASCII. Obviously that won't fly, so they solved the problem in the worst possible way: they made the representation variable! It is the *variability* of text representation that Python rejects, just as Emacs and Perl do. They happen to have chosen six different representations.[1] > So why, then, must one choose from a repertoire of Unicode-defined > stream representations if they don't meet the goal of efficient > length, indexing, or slicing operations on actual characters? One need not. But why do anything else? It's not like the authors of that standard paid no attention to various concerns about efficiency and backward compatibility! That's the question that you have not answered, and I am presently lacking in any data that suggests I'll ever need the facilities you propose. Footnotes: [1] Emacs recently changed its mind. Originally it used the so-called MULE encoding, and now a different extension of UTF-8 from Perl. Of course, Python beats that, with narrow, wide, and now PEP-393 representations! From nyamatongwe at gmail.com Thu Sep 1 10:53:28 2011 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Thu, 1 Sep 2011 18:53:28 +1000 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <87obz4k6fl.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <87vctdkbzh.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5E8840.4080600@g.nevcal.com> <87obz4k6fl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Stephen J. Turnbull: > ... ?Eg, this is why the common GUIs for Unix (X.org, GTK+, and > Qt) either provide or require UTF-8 coding for their text. Qt uses UTF-16 for its basic QString type. While QString is mostly treated as a black box which you can create from input buffers in any encoding, the only encoding allowed for a contents-by-reference QString (QString::fromRawData) is UTF-16. http://doc.qt.nokia.com/latest/qstring.html#fromRawData Neil From stephen at xemacs.org Thu Sep 1 11:15:50 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 01 Sep 2011 18:15:50 +0900 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <4E5F0CD7.3030509@g.nevcal.com> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E82B0.4020302@g.nevcal.com> <4E5E8811.90600@g.nevcal.com> <4E5F0CD7.3030509@g.nevcal.com> Message-ID: <87k49sk4hl.fsf@uwakimon.sk.tsukuba.ac.jp> Glenn Linderman writes: > How many different iterators into the same text would be concurrently > needed by an application? And why? A WYSIWYG editor for structured text (TeX, HTML) might want two (at least), one for the "source" window and one for the "rendered" window. One might want to save the state of the iterators (if that's possible) and cache it as one moves the "window" forward to make short backward motion fast, giving you two (or four, etc) more. > Seems like if it is dealing with text at the level of grapheme > clusters, it needs that type of iterator. Of course, if it does > I/O it needs codec access, but that is by nature sequential from > the starting point to the end point. `save-region' ? `save-text-remove-markup' ? From v+python at g.nevcal.com Thu Sep 1 11:20:59 2011 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 01 Sep 2011 02:20:59 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <87k49sk4hl.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E82B0.4020302@g.nevcal.com> <4E5E8811.90600@g.nevcal.com> <4E5F0CD7.3030509@g.nevcal.com> <87k49sk4hl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4E5F4E7B.4070200@g.nevcal.com> On 9/1/2011 2:15 AM, Stephen J. Turnbull wrote: > Glenn Linderman writes: > > > How many different iterators into the same text would be concurrently > > needed by an application? And why? > > A WYSIWYG editor for structured text (TeX, HTML) might want two (at > least), one for the "source" window and one for the "rendered" window. > One might want to save the state of the iterators (if that's possible) > and cache it as one moves the "window" forward to make short backward > motion fast, giving you two (or four, etc) more. Sure. But those are probably all the same type of iterators ? probably (since they are WYSIWYG) dealing with multi-codepoint characters (Guido's recent definition of grapheme, which seems to subsume both grapheme clusters and composed characters). Hence all of them would be using/requiring the same sort of representation, index, analysis, or some combination of those. > > Seems like if it is dealing with text at the level of grapheme > > clusters, it needs that type of iterator. Of course, if it does > > I/O it needs codec access, but that is by nature sequential from > > the starting point to the end point. > > `save-region' ? `save-text-remove-markup' ? Yes, save-region sounds like exactly what I was speaking of. save-text-remove-markup I would infer needs to process the text to remove the markup characters... since you used TeX and HTML as examples, markup is text, not binary (which would be a different problem). Since the TeX and HTML markup is mostly ASCII, markup removal (or more likely, text extraction) could be performed via either a grapheme iterator, or a codepoint iterator, or even a code unit iterator. -------------- next part -------------- An HTML attachment was scrubbed... URL: From v+python at g.nevcal.com Thu Sep 1 11:55:22 2011 From: v+python at g.nevcal.com (Glenn Linderman) Date: Thu, 01 Sep 2011 02:55:22 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <87pqjkk814.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E882C.1050006@g.nevcal.com> <87pqjkk814.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4E5F568A.4020301@g.nevcal.com> On 9/1/2011 12:59 AM, Stephen J. Turnbull wrote: > Glenn Linderman writes: > > > We can either artificially constrain ourselves to minor tweaks of > > the legal conforming bytestreams, > > It's not artificial. Having the internal representation be the same > as a standard encoding is very useful for a large number of minor > usages (urgently saving buffers in a text editor that knows its > internal state is inconsistent, viewing strings in the debugger, PEP > 393-style space optimization is simpler if text properties are > out-of-band, etc). saving buffers urgently when the internal state is inconsistent sounds like carefully preserving a bug. Windows 7 64-bit on one of my computers happily crashes several times a day when it detects inconsistent internal state... under the theory, I guess, that losing work is better than saving bad work. You sound the opposite. I'm actually very grateful that Firefox and emacs recover gracefully from Windows crashes, and I lose very little data from the crashes, but cannot recommend Windows 7 (this machine being my only experience with it) for stability. In any case, the operations you mention still require the data to be processed, if ever so slightly, and I'll admit that a more complex representation would require a bit more processing. Not clear that it would be huge or problematical for these cases. Except, I'm not sure how PEP 393 space optimization fits with the other operations. It may even be that an application-wide complex-grapheme cache would save significant space, although if it uses high-bits in a string representation to reference the cache, PEP 393 would jump immediately to something > 16 bits per grapheme... but likely would anyway, if complex-graphemes are in the data stream. > > or we can invent a representation (whether called str or something > > else) that is useful and efficient in practice. > > Bring on the practice, then. You say that a bit to identify lone > surrogates might be useful or efficient. In what application? How > much time or space does it save? I didn't attribute any efficiency to flagging lone surrogates (BI-5). Since Windows uses a non-validated UCS-2 or UTF-16 character type, any Python program that obtains data from Windows APIs may be confronted with lone surrogates or inappropriate combining characters at any time. Round-tripping that data seems useful, even though the data itself may not be as useful as validated Unicode characters would be. Accidentally combining the characters due to slicing and dicing the data, and doing normalizations, or what not, would not likely be appropriate. However, returning modified forms of it to Windows as UCS-2 or UTF-16 data may still cause other applications to later accidentally combine the characters, if the modifications juxtaposed things to make them look reasonably, even if accidentally. If intentionally, of course, the bit could be turned off. This exact sort of problem with non-validated UTF-8 bytes was addressed already in Python, mostly for Linux, allowing round-tripping of the byte stream, even though it is not valid. BI-6 suggests a different scheme for that, without introducing lone surrogates (which might accidentally get combined with other lone surrogates). > You say that a bit to cache a > property might be useful or efficient. In what application? Which > properties? Are those properties a set fixed by the language, or > would some bits be available for application-specific property > caching? How much time or space does that save? The brainstorming ideas I presented were just that... ideas. And they were independent. And the use of many high-order bits for properties was one of the independent ones. When I wrote that one, I was assuming a UTF-32 representation (which wastes 11 bits of each 32). One thing I did have in mind, with the high-order bits, for that representation, was to flag the start or end or middle of the codes that are included in a grapheme. That would be redundant with some of the Unicode codepoint property databases, if I understand them properly... whether it would make iterators enough more efficient to be worth the complexity would have to be benchmarked. After writing all those ideas down, I actually preferred some of the others, that achieved O(1) real grapheme indexing, rather than caching character properties. > What are the costs to applications that don't want the cache? How is > the bit-cache affected by PEP 393? If it is a separate type from str, then it costs nothing except the extra code space to implement the cache for those applications that do want it... most of which wouldn't be loaded for applications that don't, if done as a module or C extension. > I know of no answers (none!) to those questions that favor > introduction of a bit-cache representation now. And those bits aren't > going anywhere; it will always be possible to use a "wide" build and > change the representation later, if the optimization is valuable > enough. Now, I'm aware that my experience is limited to the > implementations of one general-purpose language (Emacs Lisp) of > retricted applicability. But its primary use *is* in text processing, > so I'm moderately expert. > > *Moderately*. Always interested in learning more, though. If you > know of relevant use cases, I'm listening! Even if Guido doesn't find > them convincing for Python, we might find them interesting at XEmacs. OK... ignore the bit-cache idea (BI-1), and reread the others without having your mind clogged with that one, and see if any of them make sense to you then. But you may be too biased by the "minor" needs of keeping the internal representation similar to the stream representation to see any value in them. I rather like BI-2, since it allow O(1) indexing of graphemes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Thu Sep 1 13:29:45 2011 From: ned at nedbatchelder.com (Ned Batchelder) Date: Thu, 01 Sep 2011 07:29:45 -0400 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <20110829231420.20c3516a@pitrou.net> <20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de> <20110830193806.0d718a56@pitrou.net> Message-ID: <4E5F6CA9.1080501@nedbatchelder.com> On 8/30/2011 4:41 PM, stefan brunthaler wrote: >> Ok, there there's something else you haven't told us. Are you saying >> that the original (old) bytecode is still used (and hence written to >> and read from .pyc files)? >> > Short answer: yes. > Long answer: I added an invocation counter to the code object and keep > interpreting in the usual Python interpreter until this counter > reaches a configurable threshold. When it reaches this threshold, I > create the new instruction format and interpret with this optimized > representation. All the macros look exactly the same in the source > code, they are just redefined to use the different instruction format. > I am at no point serializing this representation or the runtime > information gathered by me, as any subsequent invocation might have > different characteristics. When the switchover to the new instruction format happens, what happens to sys.settrace() tracing? Will it report the same sequence of line numbers? For a small but important class of program executions, this is more important than speed. --Ned. > Best, > --stefan From cesare.di.mauro at gmail.com Thu Sep 1 14:23:04 2011 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Thu, 1 Sep 2011 14:23:04 +0200 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: <4E5F6CA9.1080501@nedbatchelder.com> References: <20110829231420.20c3516a@pitrou.net> <20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de> <20110830193806.0d718a56@pitrou.net> <4E5F6CA9.1080501@nedbatchelder.com> Message-ID: 2011/9/1 Ned Batchelder > When the switchover to the new instruction format happens, what happens to > sys.settrace() tracing? Will it report the same sequence of line numbers? > For a small but important class of program executions, this is more > important than speed. > > --Ned > A simple solution: when tracing is enabled, the new instruction format will never be executed (and information tracking disabled as well). Regards, Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at hotpy.org Thu Sep 1 14:31:12 2011 From: mark at hotpy.org (Mark Shannon) Date: Thu, 01 Sep 2011 13:31:12 +0100 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <20110829231420.20c3516a@pitrou.net> <20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de> <20110830193806.0d718a56@pitrou.net> <4E5F6CA9.1080501@nedbatchelder.com> Message-ID: <4E5F7B10.3010001@hotpy.org> Cesare Di Mauro wrote: > 2011/9/1 Ned Batchelder > > > When the switchover to the new instruction format happens, what > happens to sys.settrace() tracing? Will it report the same sequence > of line numbers? For a small but important class of program > executions, this is more important than speed. > > --Ned > > > A simple solution: when tracing is enabled, the new instruction format > will never be executed (and information tracking disabled as well). > What happens if tracing is enabled *during* the execution of the new instruction format? Some sort of deoptimisation will be required in order to recover the correct VM state. Cheers, Mark. > Regards, > Cesare > > > ------------------------------------------------------------------------ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org From cesare.di.mauro at gmail.com Thu Sep 1 14:38:19 2011 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Thu, 1 Sep 2011 14:38:19 +0200 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: <4E5F7B10.3010001@hotpy.org> References: <20110829231420.20c3516a@pitrou.net> <20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de> <20110830193806.0d718a56@pitrou.net> <4E5F6CA9.1080501@nedbatchelder.com> <4E5F7B10.3010001@hotpy.org> Message-ID: 2011/9/1 Mark Shannon > Cesare Di Mauro wrote: > >> 2011/9/1 Ned Batchelder > ned at nedbatchelder.com>> >> >> >> When the switchover to the new instruction format happens, what >> happens to sys.settrace() tracing? Will it report the same sequence >> of line numbers? For a small but important class of program >> executions, this is more important than speed. >> >> --Ned >> >> >> A simple solution: when tracing is enabled, the new instruction format >> will never be executed (and information tracking disabled as well). >> >> What happens if tracing is enabled *during* the execution of the new > instruction format? > Some sort of deoptimisation will be required in order to recover the > correct VM state. > > Cheers, > Mark. > Sure. I don't think that the regular ceval.c loop will be "dropped" when executing the new instruction format, so we can "intercept" a change like this using the "why" variable, for example, or something similar that is normally used to break the regular loop execution. Anyway, we need to take a look at the code. Cheers, Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: From hagen at zhuliguan.net Thu Sep 1 17:30:10 2011 From: hagen at zhuliguan.net (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=) Date: Thu, 01 Sep 2011 11:30:10 -0400 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E82B0.4020302@g.nevcal.com> <4E5E8811.90600@g.nevcal.com> Message-ID: > Ok, I thought there was also a form normalized (denormalized?) to > decomposed form. But I'll take your word. If I understood the example correctly, he needs a mixed form, with some characters decomposed and some composed (depending on which one looks better in the given font). I agree that this sound more like a font problem, but it's a wide spread font problem and it may be necessary to address it in an application. But this is only one example of why an application-specific concept of graphemes different from the Unicode-defined normalized forms can be useful. I think the very concept of a grapheme is context, language, and culture specific. For example, in Chinese Pinyin it would be very natural to write tone marks with composing diacritics (i.e. in decomposed form). But then you have the vowel "?" and it would be strange to decompose it into an "u" and combining diaeresis. So conceptually the most sensible representation of "l?" would be neither the composed not the decomposed normal form, and depending on its needs an application might want to represent it in the mixed form (composing the diaeresis with the "u", but leaving the grave accent separate). There must be many more examples where the conceptual context determines the right composition, like for "?", which is Spanish is certainly a grapheme, but in mathematics might be better represented as n-tilde. The bottom line is that, while an array of Unicode code points is certainly a generally useful data type (and PEP 393 is a great improvement in this regard), an array of graphemes carries many subtleties and may not be nearly as universal. Support in the spirit of unicodedata's normalization function etc. is certainly a good thing, but we shouldn't assume that everyone will want Python to do their graphemes for them. - Hagen From guido at python.org Thu Sep 1 17:45:14 2011 From: guido at python.org (Guido van Rossum) Date: Thu, 1 Sep 2011 08:45:14 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4E537EEC.1070602@v.loewis.de> <1314099542.3485.10.camel@localhost.localdomain> <4E53945E.1050102@v.loewis.de> <1314101745.3485.18.camel@localhost.localdomain> <4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com> <87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp> <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Sep 1, 2011 at 12:13 AM, Stephen J. Turnbull wrote: > Where I cut your words, we are in 100% agreement. ?(FWIW :-) Not quite the same here, but I don't feel the need to have the last word. Most of what you say makes sense, in some cases we'll quibble later, but there are a few points where I have something to add: > No, and I can tell you why! ?The difference between characters and > words is much more important than that between code point and grapheme > cluster for most users and the developers who serve them. ?Even small > children recognize typographical ligatures as being composite objects, True -- in fact I didn't know that ff and ffl ligatures *existed* until I learned about Unix troff. > while at least this Spanish-as-a-second-language learner was taught > that `?' is an atomic character represented by a discontiguous glyph, > like `i', and it is no more related to `n' than `m' is. ?Users really > believe that characters are atomic. ?Even in the cases of Han > characters and Hangul, users think of the characters as being > "atomic," but in the sense of Bohr rather than that of Democritus. Ah, I think this may very well be culture-dependent. In Holland there are no Dutch words that use accented letters, but the accents are known because there are a lot of words borrowed from French or German. We (the Dutch) think of these as letters with accents and in fact we think of the accents as modifiers that can be added to any letter (at least I know that's how I thought about it -- perhaps I was also influenced by the way one had to type those on a mechanical typewriter). Dutch does have one native use of the umlaut (though it has a different name, I forget which, maybe trema :-), when there are two consecutive vowels that would normally be read as a special sound (diphthong?). E.g. in "koe" (cow) the oe is two letters (not a single letter formed of two distict shapes!) that mean a special sound (roughly KOO). But in a word like "co?xistentie" (coexistence) the o and e do not form the oe-sound, and to emphasize this to Dutch readers (who believe their spelling is very logical :-), the official spelling puts the umlaut on the e. This is definitely thought of as a separate mark added to the e; ? is not a new letter. I have a feeling it's the same way for the French and Germans, but I really don't know. (Antoine? Georg?) Finally, my guess is that the Spanish emphasis on ? as a separate letter has to do with teaching how it has a separate position in the localized collation sequence, doesn't it? I'm also curious if ? occurs as a separate character on Spanish keyboards. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Thu Sep 1 18:03:47 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 01 Sep 2011 18:03:47 +0200 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E537EEC.1070602@v.loewis.de> <1314099542.3485.10.camel@localhost.localdomain> <4E53945E.1050102@v.loewis.de> <1314101745.3485.18.camel@localhost.localdomain> <4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com> <87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp> <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1314893027.3617.12.camel@localhost.localdomain> Le jeudi 01 septembre 2011 ? 08:45 -0700, Guido van Rossum a ?crit : > This is definitely thought of as a separate > mark added to the e; ? is not a new letter. I have a feeling it's the > same way for the French and Germans, but I really don't know. > (Antoine? Georg?) Indeed, they are not separate "letters" (they are considered the same in lexicographic order, and the French alphabet has 26 letters). But I'm not sure how it's relevant, because you can't remove an accent without most likely making a spelling error, or at least changing the meaning. Accents are very much part of the language (while ligatures like "ff" are not, they are a rendering detail). So I would consider "?", "?", "?", etc. atomic characters for the purpose of processing French text. And I don't see how a decomposed form could help an application. Regards Antoine. From guido at python.org Thu Sep 1 18:31:53 2011 From: guido at python.org (Guido van Rossum) Date: Thu, 1 Sep 2011 09:31:53 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <1314893027.3617.12.camel@localhost.localdomain> References: <4E537EEC.1070602@v.loewis.de> <1314099542.3485.10.camel@localhost.localdomain> <4E53945E.1050102@v.loewis.de> <1314101745.3485.18.camel@localhost.localdomain> <4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com> <87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp> <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <1314893027.3617.12.camel@localhost.localdomain> Message-ID: On Thu, Sep 1, 2011 at 9:03 AM, Antoine Pitrou wrote: > Le jeudi 01 septembre 2011 ? 08:45 -0700, Guido van Rossum a ?crit : >> This is definitely thought of as a separate >> mark added to the e; ? is not a new letter. I have a feeling it's the >> same way for the French and Germans, but I really don't know. >> (Antoine? Georg?) > > Indeed, they are not separate "letters" (they are considered the same in > lexicographic order, and the French alphabet has 26 letters). > > But I'm not sure how it's relevant, because you can't remove an accent > without most likely making a spelling error, or at least changing the > meaning. Accents are very much part of the language (while ligatures > like "ff" are not, they are a rendering detail). So I would consider > "?", "?", "?", etc. atomic characters for the purpose of processing > French text. And I don't see how a decomposed form could help an > application. The example given was someone who didn't agree with how a particular font rendered those accented characters. I agree that's obscure though. I recall long ago that when the french wrote words in all caps they would drop the accents, e.g. ECOLE. I even recall (through the mists of time) observing this in Paris on public signs. Is this still the convention? Maybe it only was a compromise in the time of Morse code? -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Thu Sep 1 18:46:02 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 01 Sep 2011 18:46:02 +0200 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E537EEC.1070602@v.loewis.de> <1314099542.3485.10.camel@localhost.localdomain> <4E53945E.1050102@v.loewis.de> <1314101745.3485.18.camel@localhost.localdomain> <4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com> <87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp> <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <1314893027.3617.12.camel@localhost.localdomain> Message-ID: <1314895562.3617.19.camel@localhost.localdomain> > The example given was someone who didn't agree with how a particular > font rendered those accented characters. I agree that's obscure > though. > > I recall long ago that when the french wrote words in all caps they > would drop the accents, e.g. ECOLE. I even recall (through the mists > of time) observing this in Paris on public signs. Is this still the > convention? Maybe it only was a compromise in the time of Morse code? I think it is tolerated, partly because typing support (on computers and typewriters) has been weak. On a French keyboard, you have an "?" key, but shifting it gives you "2", not "?". The latter can be obtained using the Caps Lock key under Linux, but not under Windows. (so you could also write ?ric's name "Eric", for example) That said, most typographies nowadays seem careful to keep the accents on uppercase letters (e.g. on book covers; AFAIR, road signs also keep the accents, but I'm no driver). Regards Antoine. From stefan_ml at behnel.de Thu Sep 1 19:04:34 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 01 Sep 2011 19:04:34 +0200 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <1314893027.3617.12.camel@localhost.localdomain> Message-ID: Guido van Rossum, 01.09.2011 18:31: > On Thu, Sep 1, 2011 at 9:03 AM, Antoine Pitrou wrote: >> Le jeudi 01 septembre 2011 ? 08:45 -0700, Guido van Rossum a ?crit : >>> This is definitely thought of as a separate >>> mark added to the e; ? is not a new letter. I have a feeling it's the >>> same way for the French and Germans, but I really don't know. >>> (Antoine? Georg?) >> >> Indeed, they are not separate "letters" (they are considered the same in >> lexicographic order, and the French alphabet has 26 letters). So does the German alphabet, even though that does not include "?", which basically descended from a ligature of the old German way of writing "sz", where "s" looked similar to an "f" and "z" had a low hanging tail. IIRC, German Umlaut letters are lexicographically sorted according to their emergency replacement spelling ("?" -> "ae"), which is also sometimes used in all upper case words ("Gl?ck" -> "GLUECK"). I guess that's because Umlaut dots are harder to see on top of upper case letters. So, Latin-1 byte value sorting always yields totally wrong results. That aside, Umlaut letters are commonly considered separate letters, different from the undotted letters and also different from the replacement spellings. I, for one, always found the replacements rather weird and never got used to using them in upper case words. In any case, it's wrong to always use them, and it makes text harder to read. >> But I'm not sure how it's relevant, because you can't remove an accent >> without most likely making a spelling error, or at least changing the >> meaning. Accents are very much part of the language (while ligatures >> like "ff" are not, they are a rendering detail). So I would consider >> "?", "?", "?", etc. atomic characters for the purpose of processing >> French text. And I don't see how a decomposed form could help an >> application. > > I recall long ago that when the french wrote words in all caps they > would drop the accents, e.g. ECOLE. I even recall (through the mists > of time) observing this in Paris on public signs. Is this still the > convention? Yes, and it's a huge problem when trying to pronounce last names. In French, you'd commonly write LASTNAME, Firstname and if LASTNAME happens to have accented letters, you'd miss them when reading that. I know a couple of French people who severely suffer from this, because the pronunciation of their name gets a totally different meaning without accents. Stefan From glyph at twistedmatrix.com Thu Sep 1 19:15:45 2011 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Thu, 1 Sep 2011 10:15:45 -0700 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <20110829231420.20c3516a@pitrou.net> <20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de> <20110830193806.0d718a56@pitrou.net> <4E5F6CA9.1080501@nedbatchelder.com> Message-ID: <112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com> On Sep 1, 2011, at 5:23 AM, Cesare Di Mauro wrote: > A simple solution: when tracing is enabled, the new instruction format will never be executed (and information tracking disabled as well). Correct me if I'm wrong: doesn't this mean that no profiler will accurately be able to measure the performance impact of the new instruction format, and therefore one may get incorrect data when on is trying to make a CPU optimization for real-world performance? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Thu Sep 1 19:31:52 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 01 Sep 2011 19:31:52 +0200 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <1314895562.3617.19.camel@localhost.localdomain> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <1314893027.3617.12.camel@localhost.localdomain> <1314895562.3617.19.camel@localhost.localdomain> Message-ID: Antoine Pitrou, 01.09.2011 18:46: > AFAIR, road signs also keep the accents, but I'm no driver Right, I noticed that, too. That's certainly not uncommon. I think it's mostly because of local pride (after all, the road signs are all that many drivers ever see of a city), but sometimes also because it can't be helped when the name gets a different meaning without accents. People just cause too many accidents when they burst out laughing while entering a city by car. Stefan From guido at python.org Thu Sep 1 19:40:00 2011 From: guido at python.org (Guido van Rossum) Date: Thu, 1 Sep 2011 10:40:00 -0700 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: <112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com> References: <20110829231420.20c3516a@pitrou.net> <20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de> <20110830193806.0d718a56@pitrou.net> <112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com> Message-ID: On Thu, Sep 1, 2011 at 10:15 AM, Glyph Lefkowitz wrote: > > On Sep 1, 2011, at 5:23 AM, Cesare Di Mauro wrote: > > A simple solution: when tracing is enabled, the new instruction format will > never be executed (and information tracking disabled as well). > > Correct me if I'm wrong: doesn't this mean that no profiler will accurately > be able to measure the performance impact of the new instruction format, and > therefore one may get incorrect data when on is trying to make a CPU > optimization for real-world performance? Well, profilers already skew results by adding call overhead. But tracing for debugging and profiling don't do exactly the same thing: debug tracing stops at every line, but profiling only executes hooks at the start and end of a function(*). So I think the function body could still be executed using the new format (assuming this is turned on/off per code object anyway). (*) And whenever a generator yields or is resumed. I consider that an annoying bug though, just as the debugger doesn't do the right thing with yield -- there's no way to continue until the yielding generator is resumed short of setting a manual breakpoint on the next line. -- --Guido van Rossum (python.org/~guido) From drsalists at gmail.com Thu Sep 1 19:56:32 2011 From: drsalists at gmail.com (Dan Stromberg) Date: Thu, 1 Sep 2011 10:56:32 -0700 Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3) In-Reply-To: References: <4E5951D5.5020200@v.loewis.de> <20110828002642.4765fc89@pitrou.net> <20110828012705.523e51d4@pitrou.net> <4E5C01E4.2050106@canterbury.ac.nz> <4E5C7B48.5080402@canterbury.ac.nz> <4E5CA35E.8000509@v.loewis.de> <4E5D148B.1060606@v.loewis.de> Message-ID: On Tue, Aug 30, 2011 at 10:05 AM, Guido van Rossum wrote: > On Tue, Aug 30, 2011 at 9:49 AM, "Martin v. L?wis" > wrote: > The problem lies with the PyPy backend -- there it generates ctypes > code, which means that the signature you declare to Cython/Pyrex must > match the *linker* level API, not the C compiler level API. Thus, if > in a system header a certain function is really a macro that invokes > another function with a permuted or augmented argument list, you'd > have to know what that macro does. I also don't see how this would > work for #defined constants: where does Cython/Pyrex get their value? > ctypes doesn't have their values. > > So, for PyPy, a solution based on Cython/Pyrex has many of the same > downsides as one based on ctypes where it comes to complying with an > API defined by a .h file. > It's certainly a harder problem. For most simple constants, Cython/Pyrex might be able to generate a series of tiny C programs with which to find CPP symbol values: #include "file1.h" ... #include "filen.h" main() { printf("%d", POSSIBLE_CPP_SYMBOL1); } ...and again with %f, %s, etc. The typing is quite a mess, and code fragments would probably be impractical. But since the C Preprocessor is supposedly turing complete, maybe there's a pleasant surprise waiting there. But hopefully clang has something that'd make this easier. SIP's approach of using something close to, but not identical to, the .h's sounds like it might be pretty productive - especially if the derivative of the .h's could be automatically derived using a python script, with minor tweaks to the inputs on .h upgrades. But sip itself is apparently C++-only. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Sep 1 20:05:20 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 01 Sep 2011 14:05:20 -0400 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 9/1/2011 11:45 AM, Guido van Rossum wrote: > typewriter). Dutch does have one native use of the umlaut (though it > has a different name, I forget which, maybe trema :-), You remember correctly. According to https://secure.wikimedia.org/wikipedia/en/wiki/Trema_%28diacritic%29 'trema' (Greek 'hole') is the generic name of the double-dot vowel diacritic. It was originally used for 'diaerhesis' (Greek, 'taking apart') when it shows "that a vowel letter is not part of a digraph or diphthong". (Note that 'ae' in diaerhesis *is* a digraph ;-). Germans later used it to indicate umlaut, 'changed sound'. > when there are > two consecutive vowels that would normally be read as a special sound > (diphthong?). E.g. in "koe" (cow) the oe is two letters (not a single > letter formed of two distict shapes!) that mean a special sound > (roughly KOO). But in a word like "co?xistentie" (coexistence) the o > and e do not form the oe-sound, and to emphasize this to Dutch readers > (who believe their spelling is very logical :-), the official spelling > puts the umlaut on the e. This is definitely thought of as a separate > mark added to the e; ? is not a new letter. So the above is trema-diaerhesis. "Dutch, French, and Spanish make regular use of the diaeresis." English uses such as 'co?perate' have become rare or archaic, perhaps because we cannot type them. Too bad, since people sometimes use '-' to serve the same purpose. -- Terry Jan Reedy From stefan_ml at behnel.de Thu Sep 1 20:11:33 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 01 Sep 2011 20:11:33 +0200 Subject: [Python-Dev] Cython, Ctypes and the stdlib In-Reply-To: References: <20110828002642.4765fc89@pitrou.net> <20110828012705.523e51d4@pitrou.net> <4E5C01E4.2050106@canterbury.ac.nz> <4E5C7B48.5080402@canterbury.ac.nz> <4E5CA35E.8000509@v.loewis.de> <4E5D148B.1060606@v.loewis.de> Message-ID: Dan Stromberg, 01.09.2011 19:56: > On Tue, Aug 30, 2011 at 10:05 AM, Guido van Rossum wrote: >> The problem lies with the PyPy backend -- there it generates ctypes >> code, which means that the signature you declare to Cython/Pyrex must >> match the *linker* level API, not the C compiler level API. Thus, if >> in a system header a certain function is really a macro that invokes >> another function with a permuted or augmented argument list, you'd >> have to know what that macro does. I also don't see how this would >> work for #defined constants: where does Cython/Pyrex get their value? >> ctypes doesn't have their values. >> >> So, for PyPy, a solution based on Cython/Pyrex has many of the same >> downsides as one based on ctypes where it comes to complying with an >> API defined by a .h file. > > It's certainly a harder problem. > > For most simple constants, Cython/Pyrex might be able to generate a series > of tiny C programs with which to find CPP symbol values: > > #include "file1.h" > ... > #include "filen.h" > > main() > { > printf("%d", POSSIBLE_CPP_SYMBOL1); > } > > ...and again with %f, %s, etc.The typing is quite a mess The user will commonly declare #defined values as typed external variables and callable macros as functions in .pxd files. These manually typed "macro" functions allow users to tell Cython what it should know about how the macros will be used. And that would allow it to generate C/C++ glue code for them that uses the declared types as a real function signature and calls the macro underneath. > and code fragments would probably be impractical. Not necessarily at the C level but certainly for a ctypes backend, yes. > But hopefully clang has something that'd make this easier. For figuring these things out, maybe. Not so much for solving the problems they introduce. Stefan From stephen at xemacs.org Thu Sep 1 20:28:06 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 02 Sep 2011 03:28:06 +0900 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <4E5F568A.4020301@g.nevcal.com> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5DEC35.4010404@g.nevcal.com> <4E5E882C.1050006@g.nevcal.com> <87pqjkk814.fsf@uwakimon.sk.tsukuba.ac.jp> <4E5F568A.4020301@g.nevcal.com> Message-ID: <87fwkgjex5.fsf@uwakimon.sk.tsukuba.ac.jp> Glenn Linderman writes: > Windows 7 64-bit on one of my computers happily crashes several > times a day when it detects inconsistent internal state... under > the theory, I guess, that losing work is better than saving bad > work. You sound the opposite. Definitely. Windows apps habitually overwrite existing work; saving when inconsistent would be a bad idea. The apps I work on dump their unsaved buffers to new files, and give you a chance to look at them before instating them as the current version when you restart. > Except, I'm not sure how PEP 393 space optimization fits with the other > operations. It may even be that an application-wide complex-grapheme > cache would save significant space, although if it uses high-bits in a > string representation to reference the cache, PEP 393 would jump > immediately to something > 16 bits per grapheme... but likely would > anyway, if complex-graphemes are in the data stream. The only language I know of that uses thousands of complex graphemes is Korean ... and the precomposed forms are already in the BMP. I don't know how many accented forms you're likely to see in Vietnamese, but I suspect it's less than 6400 (the number of characters in private space in the BMP). So for most applications, I believe that mapping both non-BMP code points and grapheme clusters into that private space should be feasible. The only potential counterexample I can think of is display of Arabic, which I have heard has thousands of glyphs in good fonts because of the various ways ligatures form in that script. However AFAIK no apps encode these as characters; I'm just admitting that it *might* be useful. This will require some care in registering such characters and clusters because input text may already use private space according to some convention, which would need to be respected. Still, 6400 characters is a lot, even for the Japanese (IIRC the combined repertoire of "corporate characters" that for some reason never made it into the JIS sets is about 600, but almost all of them are already in the BMP). I believe the total number of Japanese emoticons is about 200, but I doubt that any given text is likely to use more than a few. So I think there's plenty of space there. This has a few advantages: (1) since these are real characters, all Unicode algorithms will apply as long as the appropriate properties are applied to the character in the database, and (2) it works with a narrow code unit (specifically, UCS-2, but it could also be used with UTF-8). If you really need more than 6400 grapheme clusters, promote to UTF-32, and get two more whole planes full (about 130,000 code points). > I didn't attribute any efficiency to flagging lone surrogates (BI-5). > Since Windows uses a non-validated UCS-2 or UTF-16 character type, any > Python program that obtains data from Windows APIs may be confronted > with lone surrogates or inappropriate combining characters at any > time. I don't think so. AFAIK all that data must pass through a codec, which will validate it unless you specifically tell it not to. > Round-tripping that data seems useful, The standard doesn't forbid that. (ISTR it did so in the past, but what is required in 6.0 is a specific algorithm for identifying well-formed portions of the text, basically "if you're currently in an invalid region, read individual code units and attempt to assemble a valid sequence -- as soon as you do, that is a valid code point, and you switch into valid state and return to the normal algorithm".) Specifically, since surrogates are not characters, leaving them in the data does not constitute "interpreting them as characters." I don't recall if any of the error handlers allow this, though. > However, returning modified forms of it to Windows as UCS-2 or > UTF-16 data may still cause other applications to later > accidentally combine the characters, if the modifications > juxtaposed things to make them look reasonably, even if > accidentally. In CPython AFAIK (I don't do Windows) this can only happen if you use a non-default error setting in the output codec. > After writing all those ideas down, I actually preferred some of > the others, that achieved O(1) real grapheme indexing, rather than > caching character properties. If you need O(1) grapheme indexing, use of private space seems a winner to me. It's just defining private precombined characters, and they won't bother any Unicode application, even if they leak out. > > What are the costs to applications that don't want the cache? > > How is the bit-cache affected by PEP 393? > > If it is a separate type from str, then it costs nothing except the > extra code space to implement the cache for those applications that > do want it... most of which wouldn't be loaded for applications > that don't, if done as a module or C extension. I'm talking about the bit-cache (which all of your BI-N referred to, at least indirectly). Many applications will want to work with fully composed characters, whether they're represented in a single code point or not. But they may not care about any of the bit-cache ideas. > OK... ignore the bit-cache idea (BI-1), and reread the others without > having your mind clogged with that one, and see if any of them make > sense to you then. But you may be too biased by the "minor" needs of > keeping the internal representation similar to the stream representation > to see any value in them. No, I'm biased by the fact that I already good ways to do them without leaving the set of representations provided by Unicode (often ways which provide additional advantages), and by the fact that I myself don't know any use cases for the bit-cache yet. > I rather like BI-2, since it allow O(1) indexing of graphemes. I do too (without suggesting a non-standard representation, ie, using private space), but I'm sure that wheel has been reinvented quite frequently. It's a very common trick in text processing, although I don't know of other applications where it's specifically used to turn data that "fails to be an array just a little bit" into a true array (although I suppose you could view fixed-width EUC encodings that way). From stephen at xemacs.org Thu Sep 1 20:54:56 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 02 Sep 2011 03:54:56 +0900 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E537EEC.1070602@v.loewis.de> <1314099542.3485.10.camel@localhost.localdomain> <4E53945E.1050102@v.loewis.de> <1314101745.3485.18.camel@localhost.localdomain> <4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com> <87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp> <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > On Thu, Sep 1, 2011 at 12:13 AM, Stephen J. Turnbull wrote: > > while at least this Spanish-as-a-second-language learner was taught > > that `?' is an atomic character represented by a discontiguous glyph, > > like `i', and it is no more related to `n' than `m' is. ?Users really > > believe that characters are atomic. ?Even in the cases of Han > > characters and Hangul, users think of the characters as being > > "atomic," but in the sense of Bohr rather than that of Democritus. > > Ah, I think this may very well be culture-dependent. I'm not an expert, but I'm fairly sure it is. Specifically, I heard from a TeX-ie friend that the same accented letter is typeset (and collated) differently in different European languages because in some of them the accent is considered part of the letter (making a different character), while in others accents modify a single underlying character. The ones that consider the letter and accent to constitute a single character also prefer to leave less space, he said. > But in a word like "co?xistentie" (coexistence) the o and e do not > form the oe-sound, and to emphasize this to Dutch readers (who > believe their spelling is very logical :-), the official spelling > puts the umlaut on the e. American English has the same usage, but it's optional (in particular, you'll see naive, naif, and words like coordinate typeset that way occasionally, for the same reason I suppose). As Hagen F?rstenau points out, with multiple combining characters, there are even more complex possibilities than "the accent is part of the character" and "it's really not", and they may be application- dependent. > Finally, my guess is that the Spanish emphasis on ? as a separate > letter has to do with teaching how it has a separate position in the > localized collation sequence, doesn't it? You'd have to ask Mr. Gonzalez. I suspect he may have taught that way less because of his Castellano upbringing, and more because of the infamous lack of sympathy of American high school students for the fine points of usage in foreign languages. > I'm also curious if ? occurs as a separate character on Spanish > keyboards. If I'm reading /usr/share/X11/xkb/symbols/es correctly, it does in X.org: the key that for English users would map to ASCII tilde. From solipsis at pitrou.net Thu Sep 1 20:54:45 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 01 Sep 2011 20:54:45 +0200 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4E537EEC.1070602@v.loewis.de> <1314099542.3485.10.camel@localhost.localdomain> <4E53945E.1050102@v.loewis.de> <1314101745.3485.18.camel@localhost.localdomain> <4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com> <87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp> <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1314903285.3617.31.camel@localhost.localdomain> > > Finally, my guess is that the Spanish emphasis on ? as a separate > > letter has to do with teaching how it has a separate position in the > > localized collation sequence, doesn't it? > > You'd have to ask Mr. Gonzalez. I suspect he may have taught that way > less because of his Castellano upbringing, and more because of the > infamous lack of sympathy of American high school students for the > fine points of usage in foreign languages. If you look at Wikipedia, it says: ?El alfabeto espa?ol consta de 27 letras?. The ? is separate from the N (and so is it in my French-Spanish dictionnary). The accented letters, however, are not considered separately. http://es.wikipedia.org/wiki/Alfabeto_espa%C3%B1ol (I can't tell you how annoying to type "?" is when the tilde is accessed using AltGr + 2 and you have to combine that with the Compose key and N to obtain the full character. I'm sure Spanish keyboards have a better way than that :-)) Regards Antoine. From tseaver at palladion.com Thu Sep 1 21:13:08 2011 From: tseaver at palladion.com (Tres Seaver) Date: Thu, 01 Sep 2011 15:13:08 -0400 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <1314903285.3617.31.camel@localhost.localdomain> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp> <1314903285.3617.31.camel@localhost.localdomain> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/01/2011 02:54 PM, Antoine Pitrou wrote: > > If you look at Wikipedia, it says: ?El alfabeto espa?ol consta de 27 > letras?. The ? is separate from the N (and so is it in my > French-Spanish dictionnary). The accented letters, however, are not > considered separately. > http://es.wikipedia.org/wiki/Alfabeto_espa%C3%B1ol > > (I can't tell you how annoying to type "?" is when the tilde is > accessed using AltGr + 2 and you have to combine that with the > Compose key and N to obtain the full character. I'm sure Spanish > keyboards have a better way than that :-)) FWIW, I was taught that Spanish had 30 letters in the alfabeto: the '?', plus 'ch', 'll', and 'rr' were all considered distinct characters. Kids-these-days'ly, Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5f2UQACgkQ+gerLs4ltQ4URACePSZzpoPAg2IIYZewsjbuplkK 0MgAoM7VfdQHzjBiU6Vr/MYPJ9U2qC3M =pvKn -----END PGP SIGNATURE----- From ethan at stoneleaf.us Thu Sep 1 21:38:07 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 01 Sep 2011 12:38:07 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp> <1314903285.3617.31.camel@localhost.localdomain> Message-ID: <4E5FDF1F.9010308@stoneleaf.us> Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 09/01/2011 02:54 PM, Antoine Pitrou wrote: >> If you look at Wikipedia, it says: ?El alfabeto espa?ol consta de 27 >> letras?. The ? is separate from the N (and so is it in my >> French-Spanish dictionnary). The accented letters, however, are not >> considered separately. >> http://es.wikipedia.org/wiki/Alfabeto_espa%C3%B1ol >> >> (I can't tell you how annoying to type "?" is when the tilde is >> accessed using AltGr + 2 and you have to combine that with the >> Compose key and N to obtain the full character. I'm sure Spanish >> keyboards have a better way than that :-)) > > FWIW, I was taught that Spanish had 30 letters in the alfabeto: the > '?', plus 'ch', 'll', and 'rr' were all considered distinct characters. > > Kids-these-days'ly, Not sure what's going on, but according to the article Antoine linked to those aren't letters anymore... so much for the cultural awareness portion of UNESCO. ~Ethan~ From solipsis at pitrou.net Thu Sep 1 21:34:56 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 1 Sep 2011 21:34:56 +0200 Subject: [Python-Dev] PEP 393 Summer of Code Project References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp> <1314903285.3617.31.camel@localhost.localdomain> <4E5FDF1F.9010308@stoneleaf.us> Message-ID: <20110901213456.4d38a240@pitrou.net> On Thu, 01 Sep 2011 12:38:07 -0700 Ethan Furman wrote: > > > > FWIW, I was taught that Spanish had 30 letters in the alfabeto: the > > '?', plus 'ch', 'll', and 'rr' were all considered distinct characters. > > > > Kids-these-days'ly, > > Not sure what's going on, but according to the article Antoine linked to > those aren't letters anymore... so much for the cultural awareness > portion of UNESCO. That Wikipedia article also says: ?Los d?grafos Ch y Ll tienen valores fon?ticos espec?ficos, y durante los siglos XIX y XX se ordenaron separadamente de C y L, aunque la pr?ctica se abandon? en 1994 para homogeneizar el sistema con otras lenguas.? -> roughly: ?the "Ch" and "Ll" digraphs have specific phonetic values, and during the 19th and 20th centuries they were ordered separately from C and L, but this practice was abandoned in 1994 in order to make the system consistent with other languages.? And about "rr": ?El d?grafo rr (llamado erre, /'ere/, y pronunciado /r/) nunca se consider? por separado, probablemente por no aparecer nunca en posici?n inicial.? -> ?the "rr" digraph was never considered separate, probably because it never appears at the very beginning of a word.? Regards Antoine. From greg.ewing at canterbury.ac.nz Fri Sep 2 02:30:12 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 02 Sep 2011 12:30:12 +1200 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <1314893027.3617.12.camel@localhost.localdomain> Message-ID: <4E602394.70707@canterbury.ac.nz> Guido van Rossum wrote: > I recall long ago that when the french wrote words in all caps they > would drop the accents, e.g. ECOLE. I even recall (through the mists > of time) observing this in Paris on public signs. Is this still the > convention? This page features a number of French street signs in all-caps, and some of them have accents: http://www.happymall.com/france/paris_street_signs.htm -- Greg From greg.ewing at canterbury.ac.nz Fri Sep 2 02:36:20 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 02 Sep 2011 12:36:20 +1200 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4E602504.9060505@canterbury.ac.nz> Guido van Rossum wrote: > But in a word like "co?xistentie" (coexistence) the o > and e do not form the oe-sound, and to emphasize this to Dutch readers > (who believe their spelling is very logical :-), the official spelling > puts the umlaut on the e. Sometimes this is done in English too -- occasionally you see words like "cooperation" spelled with a diaresis over the second "o". But these days it's more common to use a hyphen, or not bother at all. Everyone knows how it's pronounced. -- Greg From solipsis at pitrou.net Fri Sep 2 02:42:46 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 2 Sep 2011 02:42:46 +0200 Subject: [Python-Dev] PEP 393 Summer of Code Project References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <1314893027.3617.12.camel@localhost.localdomain> <4E602394.70707@canterbury.ac.nz> Message-ID: <20110902024246.58217e77@pitrou.net> On Fri, 02 Sep 2011 12:30:12 +1200 Greg Ewing wrote: > Guido van Rossum wrote: > > > I recall long ago that when the french wrote words in all caps they > > would drop the accents, e.g. ECOLE. I even recall (through the mists > > of time) observing this in Paris on public signs. Is this still the > > convention? > > This page features a number of French street signs > in all-caps, and some of them have accents: > > http://www.happymall.com/france/paris_street_signs.htm I don't think some American souvenir shop is a good reference, though :) (for example, there's no Paris street named "ch?teau de Versailles") Regards Antoine. From greg.ewing at canterbury.ac.nz Fri Sep 2 02:52:31 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 02 Sep 2011 12:52:31 +1200 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4E6028CF.9030801@canterbury.ac.nz> Terry Reedy wrote: > Too bad, since people sometimes use '-' to serve the same purpose. Which actually seems more logical to me -- a separating symbol is better placed between the things being separated, rather than over the top of one of them! Maybe we could compromise by turning the diaeresis on its side: co:operate -- Greg From steve at pearwood.info Fri Sep 2 03:30:44 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 02 Sep 2011 11:30:44 +1000 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <1314893027.3617.12.camel@localhost.localdomain> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <1314893027.3617.12.camel@localhost.localdomain> Message-ID: <4E6031C4.7030809@pearwood.info> Antoine Pitrou wrote: > Le jeudi 01 septembre 2011 ? 08:45 -0700, Guido van Rossum a ?crit : >> This is definitely thought of as a separate >> mark added to the e; ? is not a new letter. I have a feeling it's the >> same way for the French and Germans, but I really don't know. >> (Antoine? Georg?) > > Indeed, they are not separate "letters" (they are considered the same in > lexicographic order, and the French alphabet has 26 letters). On the other hand, the same doesn't necessarily apply to other languages. (At least according to Wikipedia.) http://en.wikipedia.org/wiki/Diacritic#Languages_with_letters_containing_diacritics -- Steven From stephen at xemacs.org Fri Sep 2 05:59:01 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 02 Sep 2011 12:59:01 +0900 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp> <1314903285.3617.31.camel@localhost.localdomain> Message-ID: <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp> Tres Seaver writes: > FWIW, I was taught that Spanish had 30 letters in the alfabeto: the > '?', plus 'ch', 'll', and 'rr' were all considered distinct characters. That was always a Castellano vs. Americano issue, IIRC. As I wrote, Mr. Gonzalez was Castellano. I believe that the deprecation of the digraphs as separate letters occurred as the telephone became widely used in Spain, and the telephone company demanded an official proclamation from whatever Ministry is responsible for culture that it was OK to treat the digraphs as two letters (specifically, to collate them that way), so that they could use the programs that came with the OS. So this stuff is not merely variant by culture, but also by economics and politics. :-/ From s.brunthaler at uci.edu Fri Sep 2 06:37:28 2011 From: s.brunthaler at uci.edu (stefan brunthaler) Date: Thu, 1 Sep 2011 21:37:28 -0700 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <20110829231420.20c3516a@pitrou.net> <20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de> <20110830193806.0d718a56@pitrou.net> <112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com> Message-ID: Hi, as promised, I created a publicly available preview of an implementation with my optimizations, which is available under the following location: https://bitbucket.org/py3_pio/preview/wiki/Home I followed Nick's advice and added some valuable advice and overview/introduction at the wiki page the link points to, I am positive that spending 10mins reading this will provide you with a valuable information regarding what's happening. In addition, as Guido already mentioned, this is more or less a direct copy of my research-branch without some of my private comments and *no* additional refactorings because of software-engineering issues (which I am very much aware of.) I hope this clarifies a *lot* and makes it easier to see what parts are involved and how all the pieces fit together. I hope you'll like it, have fun, --stefan From greg.ewing at canterbury.ac.nz Fri Sep 2 07:45:04 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 02 Sep 2011 17:45:04 +1200 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <20110902024246.58217e77@pitrou.net> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <1314893027.3617.12.camel@localhost.localdomain> <4E602394.70707@canterbury.ac.nz> <20110902024246.58217e77@pitrou.net> Message-ID: <4E606D60.2040605@canterbury.ac.nz> Antoine Pitrou wrote: > I don't think some American souvenir shop is a good reference, though :) > (for example, there's no Paris street named "ch?teau de Versailles") Hmmm, I'd assumed they were reproductions of actual street signs found in Paris, but maybe not. :-( -- Greg From ncoghlan at gmail.com Fri Sep 2 07:55:01 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 2 Sep 2011 15:55:01 +1000 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <20110829231420.20c3516a@pitrou.net> <20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de> <20110830193806.0d718a56@pitrou.net> <112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com> Message-ID: On Fri, Sep 2, 2011 at 2:37 PM, stefan brunthaler wrote: > I hope this clarifies a *lot* and makes it easier to see what parts > are involved and how all the pieces fit together. It does, thanks. There are likely to be some fun corner cases relating to trace functions and use of the "locals()" builtin, but now the code has been published hopefully those interested will be able to dig in and provide some more detailed feedback. (Not me, though - I've already dropped some things from my original personal to-do list for 3.3, so I'm not keen to start adding any more). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stefan_ml at behnel.de Fri Sep 2 08:01:09 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 02 Sep 2011 08:01:09 +0200 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <4E602504.9060505@canterbury.ac.nz> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <4E602504.9060505@canterbury.ac.nz> Message-ID: Greg Ewing, 02.09.2011 02:36: > Guido van Rossum wrote: >> But in a word like "co?xistentie" (coexistence) the o >> and e do not form the oe-sound, and to emphasize this to Dutch readers >> (who believe their spelling is very logical :-), the official spelling >> puts the umlaut on the e. > > Sometimes this is done in English too -- occasionally > you see words like "cooperation" spelled with a diaresis > over the second "o". But these days it's more common to > use a hyphen, or not bother at all. Everyone knows how > it's pronounced. Right. There are so many words in the English language that you can't pronounce without knowing them, that the few words that fall into the above category really don't matter. Stefan From tjreedy at udel.edu Fri Sep 2 08:34:01 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 02 Sep 2011 02:34:01 -0400 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp> <1314903285.3617.31.camel@localhost.localdomain> <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 9/1/2011 11:59 PM, Stephen J. Turnbull wrote: > > I believe that the deprecation of the digraphs as separate letters > occurred as the telephone became widely used in Spain, and the > telephone company demanded an official proclamation from whatever > Ministry is responsible for culture that it was OK to treat the > digraphs as two letters (specifically, to collate them that way), so > that they could use the programs that came with the OS. The main 'standards body' for Spanish is the Real Academia Espa?ola in Madrid, which works with the 21 other members of the Asociaci?n de Academias de la Lengua Espa?ola. wikimedia.org/wikipedia/en/wiki/Real_Academia_Espa?ola .wikimedia.org/wikipedia/en/wiki/Association_of_Spanish_Language_Academies While it has apparently been criticized as 'conservative' (which is well ought to be), it has been rather progressive in promoting changes such as 'ph' to 'f' (fisica, fone) and dropping silent 'p' in leading 'psi' (sicologia) and silent 's' in leading 'sci' (ciencia). -- Terry Jan Reedy From jeremy at jeremysanders.net Fri Sep 2 10:55:32 2011 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Fri, 02 Sep 2011 09:55:32 +0100 Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3) References: <4E5951D5.5020200@v.loewis.de> <20110828002642.4765fc89@pitrou.net> <20110828012705.523e51d4@pitrou.net> <4E5C01E4.2050106@canterbury.ac.nz> <4E5C7B48.5080402@canterbury.ac.nz> <4E5CA35E.8000509@v.loewis.de> <4E5D148B.1060606@v.loewis.de> Message-ID: Dan Stromberg wrote: > SIP's approach of using something close to, but not identical to, the .h's > sounds like it might be pretty productive - especially if the derivative > of the .h's could be automatically derived using a python script, with > minor > tweaks to the inputs on .h upgrades. But sip itself is apparently > C++-only. http://www.riverbankcomputing.co.uk/software/sip/intro "What is SIP? One of the features of Python that makes it so powerful is the ability to take existing libraries, written in C or C++, and make them available as Python extension modules. Such extension modules are often called bindings for the library. SIP is a tool that makes it very easy to create Python bindings for C and C++ libraries. It was originally developed to create PyQt, the Python bindings for the Qt toolkit, but can be used to create bindings for any C or C++ library. " It's not C++ only. The code for SIP is also in C. Jeremy From stefan_ml at behnel.de Fri Sep 2 11:13:19 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 02 Sep 2011 11:13:19 +0200 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <4E5CA1F0.2070005@v.loewis.de> <20110830193806.0d718a56@pitrou.net> <112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com> Message-ID: stefan brunthaler, 02.09.2011 06:37: > as promised, I created a publicly available preview of an > implementation with my optimizations, which is available under the > following location: > https://bitbucket.org/py3_pio/preview/wiki/Home > > I followed Nick's advice and added some valuable advice and > overview/introduction at the wiki page the link points to, I am > positive that spending 10mins reading this will provide you with a > valuable information regarding what's happening. It does, thanks. A couple of remarks: 1) The SFC optimisation is purely based on static code analysis, right? I assume it takes loops into account (and just multiplies scores for inner loops)? Is that what you mean with "nesting level"? Obviously, static analysis can sometimes be misleading, e.g. when there's a rare special case with lots of loops that needs to adapt input data in some way, but in general, I'd expect that this heuristic would tend to hit the important cases, especially for well structured code with short functions. 2) The RC elimination is tricky to get right and thus somewhat dangerous, but sounds worthwhile and should work particularly well on a stack based byte code interpreter like CPython. 3) Inline caching also sounds worthwhile, although I wonder how large the savings will be here. You'd save a couple of indirect jumps at the C-API level, sure, but apart from that, my guess is that it would highly depend on the type of instruction. Certain (repeated) calls to C implemented functions would likely benefit quite a bit, for example, which would be a nice optimisation by itself, e.g. for builtins. I would expect that the same applies to iterators, even a couple of percent faster iteration can make a great deal of a difference, and a substantial set of iterators are implemented in C, e.g. itertools, range, zip and friends. I'm not so sure about arithmetic operations. In Cython, we (currently?) do not optimistically replace these with more specific code (unless we know the types at compile time), because it complicates the generated C code and indirect jumps aren't all that slow that the benefit would be important. Savings are *much* higher when data can be unboxed, so much that the slight improvement for optimistic type guesses is totally dwarfed in Cython. I would expect that the return of investment is better when the types are actually known at runtime, as in your case. 4) Regarding inlined object references, I would expect that it's much more worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST. I guess that this would be best helped by watching the module dict and the builtin dict internally and invalidating the interpreter state after changes (e.g. by providing a change counter in those dicts and checking that in the instructions that access them), and otherwise keeping the objects cached. Simply watching the dedicated instructions that change that state isn't enough as Python allows code to change these dicts directly through their dict interface. All in all, your list does sound like an interesting set of changes that are both understandable and worthwhile. Stefan From s.brunthaler at uci.edu Fri Sep 2 17:20:28 2011 From: s.brunthaler at uci.edu (stefan brunthaler) Date: Fri, 2 Sep 2011 08:20:28 -0700 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <20110829231420.20c3516a@pitrou.net> <20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de> <20110830193806.0d718a56@pitrou.net> <112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com> Message-ID: > as promised, I created a publicly available preview of an > implementation with my optimizations, which is available under the > following location: > https://bitbucket.org/py3_pio/preview/wiki/Home > One very important thing that I forgot was to indicate that you have to use computed gotos (i.e., "configure --with-computed-gotos"), otherwise it won't work (though I think that most people can figure this out easily, knowing this a priori isn't too bad.) Regards, --stefan From s.brunthaler at uci.edu Fri Sep 2 17:55:03 2011 From: s.brunthaler at uci.edu (stefan brunthaler) Date: Fri, 2 Sep 2011 08:55:03 -0700 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <4E5CA1F0.2070005@v.loewis.de> <20110830193806.0d718a56@pitrou.net> <112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com> Message-ID: > 1) The SFC optimisation is purely based on static code analysis, right? I > assume it takes loops into account (and just multiplies scores for inner > loops)? Is that what you mean with "nesting level"? Obviously, static > analysis can sometimes be misleading, e.g. when there's a rare special case > with lots of loops that needs to adapt input data in some way, but in > general, I'd expect that this heuristic would tend to hit the important > cases, especially for well structured code with short functions. > Yes, currently I only use the heuristic to statically estimate utility of assigning an optimized slot to a local variable. And, another yes, nested blocks (like for-statements) is what I have in mind when using "nesting level". I was told that the algorithm itself is very similar to linear scan register allocation, modulo the ability to spill values, of course. >From my benchmarks and in-depth analysis of several programs, I found this to work very well. In fact, the only situation I found is (unfortunately) one of the top-most executed functions in US' bm_django.py: There is one loop that gets almost never executed but this loop gives precedence to local variables used inside. Because of this, I have already an idea for a better approach: first, use the static heuristic to compute stack slot score, then count back-branches (I would need this anyways, as the _Py_CheckInterval has gone and OSR/hot-swapping is in general a good idea) and record their frequency. Next, just replace the current static weight of 100 by the dynamically recorded weight. Consequently, you should get better allocations. (Please note that I did some quantitative analysis of bython functions to determine that using 4 SFC-slots covers a substantial amount of functions [IIRC >95%] with the trivial scenario when there are at most 4 local variables.) > 2) The RC elimination is tricky to get right and thus somewhat dangerous, > but sounds worthwhile and should work particularly well on a stack based > byte code interpreter like CPython. > Well, it was very tricky to get right when I implemented it first around Christmas 2009. The current implementation is reasonably simple to understand, however, it depends on the function refcount_effect to give me correct information at all times. I got the biggest performance improvement on one benchmark on the PowerPC and think that RISC architectures in general benefit more from this optimization (eliminate the load, add and store instructions) than x86 CISCs do (an INCREF is just an add on the memory location without data dependencies, so fairly cheap). In any case, however, you get the replication effect of improving CPU branch predicion by having these additional instruction derivatives. It would be interesting (research-wise, too) to be able to measure whether the reduction in memory operations makes Python programs use less energy, and if so, how much the difference is. > 3) Inline caching also sounds worthwhile, although I wonder how large the > savings will be here. You'd save a couple of indirect jumps at the C-API > level, sure, but apart from that, my guess is that it would highly depend on > the type of instruction. Certain (repeated) calls to C implemented functions > would likely benefit quite a bit, for example, which would be a nice > optimisation by itself, e.g. for builtins. I would expect that the same > applies to iterators, even a couple of percent faster iteration can make a > great deal of a difference, and a substantial set of iterators are > implemented in C, e.g. itertools, range, zip and friends. > > I'm not so sure about arithmetic operations. In Cython, we (currently?) do > not optimistically replace these with more specific code (unless we know the > types at compile time), because it complicates the generated C code and > indirect jumps aren't all that slow that the benefit would be important. > Savings are *much* higher when data can be unboxed, so much that the slight > improvement for optimistic type guesses is totally dwarfed in Cython. I > would expect that the return of investment is better when the types are > actually known at runtime, as in your case. > Well, in my thesis I already hint at another improvement of the existing design that can work on unboxed data as well (while still being an interpreter.) I am eager to try this, but don't know how much time I can spend on this (because there are several other research projects I am actively involved in.) In my experience, this works very well and you cannot actually report good speedups without inline-caching arithmetic operations, simply because that's where all JITs shine and most benchmarks don't reflect real world scenarios but mathematics-inclined microbenchmarks. Also, if in the future compilers (gcc and clang) will be able to inline the invoked functions, higher speedups will be possible. > 4) Regarding inlined object references, I would expect that it's much more > worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST. I guess > that this would be best helped by watching the module dict and the builtin > dict internally and invalidating the interpreter state after changes (e.g. > by providing a change counter in those dicts and checking that in the > instructions that access them), and otherwise keeping the objects cached. > Simply watching the dedicated instructions that change that state isn't > enough as Python allows code to change these dicts directly through their > dict interface. > Ok, I thought about something along these lines, too, but in the end, decided to go with the current design, as it is easy and language neutral (for my research I primarily chose Python as a demonstration vehicle and none of these techniques is specific to Python.) LOAD_GLOBAL pays off handsomly, and I think that I could easily make it correct for all cases, if I knew the places that need to call "invalidate_cache". Most of the LOAD_CONST instructions can be replaced with the inlined-version (INCA_LOAD_CONST), and while I did not do any benchmarks only on this, simply because they are very frequently executed, even small optimizations pay off nicely. Another point is that you can slim down the activation record of PyEval_EvalFrameEx, because you don't need to use the "consts" field anymore (similarly, you could probably eliminate the "names" and "fastlocals" fields, if you find that most of the frequent and fast cases are covered by the optimized instructions.) > All in all, your list does sound like an interesting set of changes that are > both understandable and worthwhile. > Thanks, I think so, too, which is why I wanted to integrate the optimizations with CPython in the first place. Thanks for the pointers to the dict stuff, I will take a look (IIRC, Antoine pointed me in the same direction last year, but I think the design was slightly different then), --stefan From jcea at jcea.es Fri Sep 2 17:57:10 2011 From: jcea at jcea.es (Jesus Cea) Date: Fri, 02 Sep 2011 17:57:10 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot Message-ID: <4E60FCD6.3090005@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 A single instance of buildbot in the OpenIndiana buildbot is eating 1.4GB of RAM and 3.8GB of SWAP and growing. The build hangs or die with a "out of memory" error, eventually. This is 100% reproducible. Everytime I force a build thru the buildbot control page, I see this: takes huge memory and dies with an "out of memory" or hangs. I am allocating 4GB to the buildbots. I think this is not normal. I am the only one seen such a memory usage?. I haven't changed anything in my buildbots for months... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmD81plgi5GaxT1NAQJIfQP+LvxG8jGDcfdsKB3omkM8fE/pA3q3yVQL qVtSPQomCNB3hhhctEXnSFmDDekOTroCTpU9lYp6c9ZLmSCEGJx7bVW/53hk9ZJv oMNwSHvQbrZy/eWuJAlSUqIl2oAmMP75RiDhL2eqBu/alhavK8oXCeDV7iG9EvZq 0RH9Weqr788= =3jyf -----END PGP SIGNATURE----- From status at bugs.python.org Fri Sep 2 18:07:27 2011 From: status at bugs.python.org (Python tracker) Date: Fri, 2 Sep 2011 18:07:27 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20110902160727.B2C881CFD5@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-08-26 - 2011-09-02) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 2967 ( +4) closed 21701 (+36) total 24668 (+40) Open issues with patches: 1283 Issues opened (32) ================== #12837: Patch for issue #12810 removed a valid check on socket ancilla http://bugs.python.org/issue12837 reopened by brett.cannon #12848: pickle.py treats 32bit lengths as signed, but _pickle.c as uns http://bugs.python.org/issue12848 opened by pitrou #12849: urllib2 headers issue http://bugs.python.org/issue12849 opened by shubhojeet.ghosh #12850: [PATCH] stm.atomic http://bugs.python.org/issue12850 opened by arigo #12851: ctypes: getbuffer() never provides strides http://bugs.python.org/issue12851 opened by skrah #12852: POSIX level issues in posixmodule.c on OpenBSD 5.0 http://bugs.python.org/issue12852 opened by rpointel #12853: global name 'r' is not defined in upload.py http://bugs.python.org/issue12853 opened by reowen #12854: PyOS_Readline usage in tokenizer ignores sys.stdin/sys.stdout http://bugs.python.org/issue12854 opened by Albert.Zeyer #12855: linebreak sequences should be better documented http://bugs.python.org/issue12855 opened by Matthew.Boehm #12856: tempfile PRNG reuse between parent and child process http://bugs.python.org/issue12856 opened by ferringb #12857: Expose called function on frame object http://bugs.python.org/issue12857 opened by eric.snow #12858: crypt.mksalt: use ssl.RAND_pseudo_bytes() if available http://bugs.python.org/issue12858 opened by haypo #12860: http client attempts to send a readable object twice http://bugs.python.org/issue12860 opened by langmartin #12861: PyOS_Readline uses single lock http://bugs.python.org/issue12861 opened by Albert.Zeyer #12862: ConfigParser does not implement "comments need to be preceded http://bugs.python.org/issue12862 opened by DanielFortunov #12863: py32 > Lib > xml.minidom > usage feedback > overrides http://bugs.python.org/issue12863 opened by GPU.Group #12864: 2to3 creates illegal code on import a.b inside a package http://bugs.python.org/issue12864 opened by simohe #12866: Want to submit our Audioop.c patch for 24bit audio http://bugs.python.org/issue12866 opened by Peder.J??rgensen #12869: PyOS_StdioReadline is printing the prompt on stderr http://bugs.python.org/issue12869 opened by Albert.Zeyer #12870: Regex object should have introspection methods http://bugs.python.org/issue12870 opened by mattchaput #12871: Disable sched_get_priority_min/max if Python is compiled witho http://bugs.python.org/issue12871 opened by haypo #12872: --with-tsc crashes on ppc64 http://bugs.python.org/issue12872 opened by dmalcolm #12873: 2to3 incorrectly handles multi-line imports from __future__ http://bugs.python.org/issue12873 opened by Arfrever #12875: backport re.compile flags default value documentation http://bugs.python.org/issue12875 opened by eli.bendersky #12876: Make Test Error : ImportError: No module named _sha256 http://bugs.python.org/issue12876 opened by wah meng #12878: io.StringIO doesn't provide a __dict__ field http://bugs.python.org/issue12878 opened by ericp #12880: ctypes: clearly document how structure bit fields are allocate http://bugs.python.org/issue12880 opened by meadori #12881: ctypes: segfault with large structure field names http://bugs.python.org/issue12881 opened by meadori #12882: mmap crash on Windows http://bugs.python.org/issue12882 opened by itabhijitb #12883: xml.sax.xmlreader.AttributesImpl allows empty string as attrib http://bugs.python.org/issue12883 opened by Michael.Sulyaev #12885: distutils.filelist.findall() fails on broken symlink in Py2.x http://bugs.python.org/issue12885 opened by Alexander.Dutton #12886: datetime.strptime parses input wrong http://bugs.python.org/issue12886 opened by heidar.rafn Most recent 15 issues with no replies (15) ========================================== #12885: distutils.filelist.findall() fails on broken symlink in Py2.x http://bugs.python.org/issue12885 #12883: xml.sax.xmlreader.AttributesImpl allows empty string as attrib http://bugs.python.org/issue12883 #12881: ctypes: segfault with large structure field names http://bugs.python.org/issue12881 #12880: ctypes: clearly document how structure bit fields are allocate http://bugs.python.org/issue12880 #12873: 2to3 incorrectly handles multi-line imports from __future__ http://bugs.python.org/issue12873 #12872: --with-tsc crashes on ppc64 http://bugs.python.org/issue12872 #12869: PyOS_StdioReadline is printing the prompt on stderr http://bugs.python.org/issue12869 #12866: Want to submit our Audioop.c patch for 24bit audio http://bugs.python.org/issue12866 #12864: 2to3 creates illegal code on import a.b inside a package http://bugs.python.org/issue12864 #12863: py32 > Lib > xml.minidom > usage feedback > overrides http://bugs.python.org/issue12863 #12862: ConfigParser does not implement "comments need to be preceded http://bugs.python.org/issue12862 #12860: http client attempts to send a readable object twice http://bugs.python.org/issue12860 #12858: crypt.mksalt: use ssl.RAND_pseudo_bytes() if available http://bugs.python.org/issue12858 #12854: PyOS_Readline usage in tokenizer ignores sys.stdin/sys.stdout http://bugs.python.org/issue12854 #12851: ctypes: getbuffer() never provides strides http://bugs.python.org/issue12851 Most recent 15 issues waiting for review (15) ============================================= #12872: --with-tsc crashes on ppc64 http://bugs.python.org/issue12872 #12857: Expose called function on frame object http://bugs.python.org/issue12857 #12856: tempfile PRNG reuse between parent and child process http://bugs.python.org/issue12856 #12855: linebreak sequences should be better documented http://bugs.python.org/issue12855 #12852: POSIX level issues in posixmodule.c on OpenBSD 5.0 http://bugs.python.org/issue12852 #12850: [PATCH] stm.atomic http://bugs.python.org/issue12850 #12842: Docs: first parameter of tp_richcompare() always has the corre http://bugs.python.org/issue12842 #12841: Incorrect tarfile.py extraction http://bugs.python.org/issue12841 #12837: Patch for issue #12810 removed a valid check on socket ancilla http://bugs.python.org/issue12837 #12832: The documentation for the print function should explain/point http://bugs.python.org/issue12832 #12822: NewGIL should use CLOCK_MONOTONIC if possible. http://bugs.python.org/issue12822 #12820: Tests for Lib/xml/dom/minicompat.py http://bugs.python.org/issue12820 #12819: PEP 393 - Flexible Unicode String Representation http://bugs.python.org/issue12819 #12818: email.utils.formataddr incorrectly quotes parens inside quoted http://bugs.python.org/issue12818 #12817: test_multiprocessing: io.BytesIO() requires bytearray buffers http://bugs.python.org/issue12817 Top 10 most discussed issues (10) ================================= #12852: POSIX level issues in posixmodule.c on OpenBSD 5.0 http://bugs.python.org/issue12852 15 msgs #12736: Request for python casemapping functions to use full not simpl http://bugs.python.org/issue12736 15 msgs #2636: Adding a new regex module (compatible with re) http://bugs.python.org/issue2636 14 msgs #12855: linebreak sequences should be better documented http://bugs.python.org/issue12855 10 msgs #12729: Python lib re cannot handle Unicode properly due to narrow/wid http://bugs.python.org/issue12729 9 msgs #12850: [PATCH] stm.atomic http://bugs.python.org/issue12850 9 msgs #12735: request full Unicode collation support in std python library http://bugs.python.org/issue12735 7 msgs #12837: Patch for issue #12810 removed a valid check on socket ancilla http://bugs.python.org/issue12837 7 msgs #12841: Incorrect tarfile.py extraction http://bugs.python.org/issue12841 6 msgs #12876: Make Test Error : ImportError: No module named _sha256 http://bugs.python.org/issue12876 6 msgs Issues closed (36) ================== #6069: casting error from ctypes array to structure http://bugs.python.org/issue6069 closed by meadori #6980: fix ctypes build failure on armel-linux-gnueabi with -mfloat-a http://bugs.python.org/issue6980 closed by meadori #8296: multiprocessing.Pool hangs when issuing KeyboardInterrupt http://bugs.python.org/issue8296 closed by vinay.sajip #8409: gettext should honor $LOCPATH environment variable http://bugs.python.org/issue8409 closed by barry #9651: ctypes crash when writing zerolength string buffer to file http://bugs.python.org/issue9651 closed by amaury.forgeotdarc #9923: mailcap module may not work on non-POSIX platforms if MAILCAPS http://bugs.python.org/issue9923 closed by ncoghlan #10086: test_sysconfig failure when prefix matches /site http://bugs.python.org/issue10086 closed by eric.araujo #11241: ctypes: subclassing an already subclassed ArrayType generates http://bugs.python.org/issue11241 closed by amaury.forgeotdarc #11564: pickle not 64-bit ready http://bugs.python.org/issue11564 closed by pitrou #11879: TarFile.chown: should use TarInfo.uid if user lookup fails http://bugs.python.org/issue11879 closed by lars.gustaebel #11920: ctypes: Strange bitfield structure sizing issue http://bugs.python.org/issue11920 closed by meadori #12195: Little documentation of annotations http://bugs.python.org/issue12195 closed by rhettinger #12287: ossaudiodev: stack corruption with FD >= FD_SETSIZE http://bugs.python.org/issue12287 closed by neologix #12472: Build failure on IRIX http://bugs.python.org/issue12472 closed by neologix #12494: subprocess: check_output() doesn't close pipes on error http://bugs.python.org/issue12494 closed by haypo #12636: IDLE ignores -*- coding -*- with -r option http://bugs.python.org/issue12636 closed by haypo #12720: Expose linux extended filesystem attributes http://bugs.python.org/issue12720 closed by python-dev #12742: Add support for CESU-8 encoding http://bugs.python.org/issue12742 closed by ezio.melotti #12793: allow filters in os.walk http://bugs.python.org/issue12793 closed by rhettinger #12802: Windows error code 267 should be mapped to ENOTDIR, not EINVAL http://bugs.python.org/issue12802 closed by pitrou #12829: pyexpat segmentation fault caused by multiple calls to Parse() http://bugs.python.org/issue12829 closed by ned.deily #12835: Missing SSLSocket.sendmsg() wrapper allows programs to send un http://bugs.python.org/issue12835 closed by ncoghlan #12839: zlibmodule cannot handle Z_VERSION_ERROR zlib error http://bugs.python.org/issue12839 closed by nadeem.vawda #12843: file object read* methods in append mode overflows http://bugs.python.org/issue12843 closed by amaury.forgeotdarc #12846: unicodedata.normalize turkish letter problem http://bugs.python.org/issue12846 closed by terry.reedy #12847: crash with negative PUT in pickle http://bugs.python.org/issue12847 closed by pitrou #12859: readline implementation doesn't release the GIL http://bugs.python.org/issue12859 closed by Albert.Zeyer #12865: import SimpleHTTPServer http://bugs.python.org/issue12865 closed by amaury.forgeotdarc #12867: linecache.getline() Returning Error http://bugs.python.org/issue12867 closed by ned.deily #12868: test_faulthandler.test_stack_overflow() failed on OpenBSD http://bugs.python.org/issue12868 closed by neologix #12874: Rearrange descriptions of builtin types in the Library referen http://bugs.python.org/issue12874 closed by ezio.melotti #12877: Popen(...).stdout.seek(...) throws "Illegal seek" http://bugs.python.org/issue12877 closed by haypo #12879: "method-wrapper" objects are difficult to inspect http://bugs.python.org/issue12879 closed by benjamin.peterson #12884: Re http://bugs.python.org/issue12884 closed by ezio.melotti #1462440: socket and threading: udp multicast setsockopt fails http://bugs.python.org/issue1462440 closed by neologix #10946: bdist doesn???t pass --skip-build on to subcommands http://bugs.python.org/issue10946 closed by eric.araujo From zvezdan at zope.com Fri Sep 2 18:01:56 2011 From: zvezdan at zope.com (Zvezdan Petkovic) Date: Fri, 2 Sep 2011 12:01:56 -0400 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <4E6031C4.7030809@pearwood.info> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <1314893027.3617.12.camel@localhost.localdomain> <4E6031C4. 7030809@pearwood.info> Message-ID: <42D586D2-A78B-4A84-98E0-5F43172A61D7@zope.com> On Sep 1, 2011, at 9:30 PM, Steven D'Aprano wrote: > Antoine Pitrou wrote: >> Le jeudi 01 septembre 2011 ? 08:45 -0700, Guido van Rossum a ?crit : >>> This is definitely thought of as a separate >>> mark added to the e; ? is not a new letter. I have a feeling it's the same way for the French and Germans, but I really don't know. >>> (Antoine? Georg?) >> Indeed, they are not separate "letters" (they are considered the same in lexicographic order, and the French alphabet has 26 letters). > > > On the other hand, the same doesn't necessarily apply to other languages. (At least according to Wikipedia.) > > http://en.wikipedia.org/wiki/Diacritic#Languages_with_letters_containing_diacritics For example, in Serbo-Croatian (Serbian, Croatian, Bosnian, Montenegrin, if you want), each of the following letters represent one distinct sound of the language. In Serbian Cyrillic alphabet, they are distinct symbols. In Latin alphabet, the corresponding letters are formed with diacritics because the alphabet is shorter. Letter Approximate pronunciation Cyrillic ------ ------------------------- -------- ? tch in butcher ? ? ch in chapter, but softer ? d? j in jump ? ? j in juice ? ? sh in ship ? ? s in pleasure, measure, ... ? The language has 30 sounds and the corresponding 30 letters. See the count of the letters in these tables: - http://hr.wikipedia.org/wiki/Hrvatska_abeceda - http://sr.wikipedia.org/wiki/?????? Diacritics are used in grammar books and in print (occasionally) to distinguish between four different accents of the language: - long rising: ?, - short rising: ?, - long falling: ? (inverted breve, *not* a circumflex ?), and - short falling: ?, especially when the words that use the same sounds -- thus, spelled with the same letters -- are next to each other. The accents are used to change the intonation of the whole word, not to change the sound of the letter. For example: "Ja sam s?m." -- "I am alone." Both words "sam" contain the "a" sound, but the first one is pronounced short. As a form of the verb "to be" it's an enclitic that takes the accent of the preceding word "I". The second one is pronounced with a long falling accent. The macron can be used to indicate the length of a *non-stressed* vowel, e.g. ?, but is usually unnecessary in standard print. Many languages use alphabets that are not suitable to their sound system. The speakers of these languages adapted alphabets to their sounds either by using letters with distinct shapes (Cyrillic letters above), or adding diacritics to an existing shape (Latin letters above). The new combined form is a distinct letter. These letters have separate sections in dictionaries and a sorting order. The diacritics that indicate an accent or length are used only above vowels and do *not* represent distinct letters. Best regards, Zvezdan Petkovi? P.S. Since I live in the USA, the last letter of my surname is *wrongly* spelled (? -> c) and pronounced (ch -> k) most of the time. :-) From stefan_ml at behnel.de Fri Sep 2 19:12:21 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 02 Sep 2011 19:12:21 +0200 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <20110830193806.0d718a56@pitrou.net> <112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com> Message-ID: stefan brunthaler, 02.09.2011 17:55: >> 4) Regarding inlined object references, I would expect that it's much more >> worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST. I guess >> that this would be best helped by watching the module dict and the builtin >> dict internally and invalidating the interpreter state after changes (e.g. >> by providing a change counter in those dicts and checking that in the >> instructions that access them), and otherwise keeping the objects cached. >> Simply watching the dedicated instructions that change that state isn't >> enough as Python allows code to change these dicts directly through their >> dict interface. > [...] > Thanks for the pointers to the dict stuff, I will take a look (IIRC, > Antoine pointed me in the same direction last year, but I think the > design was slightly different then), Not unlikely, Antoine tends to know the internals pretty well. The Cython project has been (hand wavingly) thinking about this also: implement our own module type with its own __setattr__ (and dict proxy) in order to speed up access to the globals in the *very* likely case that they rarely or never change after module initialisation time and that most critical code accesses them read-only from within functions. If it turns out that this makes sense for CPython in general, it wouldn't be a bad idea to join forces at some point in order to make this readily usable for both sides. Stefan From jcea at jcea.es Fri Sep 2 19:53:37 2011 From: jcea at jcea.es (Jesus Cea) Date: Fri, 02 Sep 2011 19:53:37 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: <4E60FCD6.3090005@jcea.es> References: <4E60FCD6.3090005@jcea.es> Message-ID: <4E611821.9050108@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/09/11 17:57, Jesus Cea wrote: > The build hangs or die with a "out of memory" error, eventually. A simple "make test" with python not compiled with "pydebug" and skipping all the optional tests (like zip64) is taking up to 300MB of RAM. Python 2.7 branch, current tip. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmEYIZlgi5GaxT1NAQK79gP/aRyMqgEE7uScYtrZzPqs0ZSpGnVM8sBi RbNEN3cB/s6Oe/UVIo4vinaDnXXYSOM5qtqghUl5Cnx+wiiK2cL8iIv/YzZbjT9s U8QELEkol8lpjAVPEO/rSylZ5kvsmdjkM2mU6NOwiLGw+mmbbgqpmdAU14p+sqSO 2xFJElgOHuM= =YA0J -----END PGP SIGNATURE----- From solipsis at pitrou.net Fri Sep 2 20:14:15 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 2 Sep 2011 20:14:15 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> Message-ID: <20110902201415.773da7d6@pitrou.net> On Fri, 02 Sep 2011 19:53:37 +0200 Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 02/09/11 17:57, Jesus Cea wrote: > > The build hangs or die with a "out of memory" error, eventually. > > A simple "make test" with python not compiled with "pydebug" and > skipping all the optional tests (like zip64) is taking up to 300MB of > RAM. Python 2.7 branch, current tip. Can you tell if it's something recent or it has always been like that? Regards Antoine. From tseaver at palladion.com Fri Sep 2 20:22:04 2011 From: tseaver at palladion.com (Tres Seaver) Date: Fri, 02 Sep 2011 14:22:04 -0400 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp> <1314903285.3617.31.camel@localhost.localdomain> <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/01/2011 11:59 PM, Stephen J. Turnbull wrote: > Tres Seaver writes: > >> FWIW, I was taught that Spanish had 30 letters in the alfabeto: >> the '?', plus 'ch', 'll', and 'rr' were all considered distinct >> characters. > > That was always a Castellano vs. Americano issue, IIRC. As I wrote, > Mr. Gonzalez was Castellano. - From a casual web search, it looks as though the RAE didn't legislate "letterness" away from the digraphs (as I learned them) until 1994 (about 25 years after I learned the 30-letter alfabeto). > I believe that the deprecation of the digraphs as separate letters > occurred as the telephone became widely used in Spain, and the > telephone company demanded an official proclamation from whatever > Ministry is responsible for culture that it was OK to treat the > digraphs as two letters (specifically, to collate them that way), so > that they could use the programs that came with the OS. > > So this stuff is not merely variant by culture, but also by > economics and politics. :-/ Lovely. :) Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5hHswACgkQ+gerLs4ltQ7m9ACeOJZRgjcm9pd0Rnry26zP0I3t 53cAoLv78VD5eIdbjvboLaysoeREIp1t =0PuR -----END PGP SIGNATURE----- From fijall at gmail.com Fri Sep 2 20:42:07 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 2 Sep 2011 20:42:07 +0200 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <201108292357.36628.victor.stinner@haypocalc.com> Message-ID: > > For a comparative real world benchmark I tested Martin von Loewis' > django port (there are not that many meaningful Python 3 real world > benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably > well, US got a speedup of 1.35 on this benchmark. I just checked that > pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures > seem to be not working currently or *really* fast...), but I cannot > tell directly how that relates to speedups (it just says "less is > better" and I did not quickly find an explanation). > Since I did this benchmark last year, I have spent more time > investigating this benchmark and found that I could do better, but I > would have to guess as to how much (An interesting aside though: on > this benchmark, the executable never grew on more than 5 megs of > memory usage, exactly like the vanilla Python 3 interpreter.) > PyPy is ~12x faster on the django benchmark FYI From stefan_ml at behnel.de Fri Sep 2 21:20:57 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 02 Sep 2011 21:20:57 +0200 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <201108292357.36628.victor.stinner@haypocalc.com> Message-ID: Maciej Fijalkowski, 02.09.2011 20:42: >> For a comparative real world benchmark I tested Martin von Loewis' >> django port (there are not that many meaningful Python 3 real world >> benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably >> well, US got a speedup of 1.35 on this benchmark. I just checked that >> pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures >> seem to be not working currently or *really* fast...), but I cannot >> tell directly how that relates to speedups (it just says "less is >> better" and I did not quickly find an explanation). > > PyPy is ~12x faster on the django benchmark FYI FYI, there's a recent thread up on the pypy ML where someone is complaining about PyPy being substantially slower than CPython when running Django on top of SQLite. Also note that PyPy doesn't implement Py3 yet, so the benchmark results are not comparable anyway. As usual, benchmark results depend on what you do in your benchmarks. Stefan From fijall at gmail.com Fri Sep 2 21:59:21 2011 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 2 Sep 2011 21:59:21 +0200 Subject: [Python-Dev] Python 3 optimizations continued... In-Reply-To: References: <201108292357.36628.victor.stinner@haypocalc.com> Message-ID: On Fri, Sep 2, 2011 at 9:20 PM, Stefan Behnel wrote: > Maciej Fijalkowski, 02.09.2011 20:42: >>> >>> For a comparative real world benchmark I tested Martin von Loewis' >>> django port (there are not that many meaningful Python 3 real world >>> benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably >>> well, US got a speedup of 1.35 on this benchmark. I just checked that >>> pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures >>> seem to be not working currently or *really* fast...), but I cannot >>> tell directly how that relates to speedups (it just says "less is >>> better" and I did not quickly find an explanation). >> >> PyPy is ~12x faster on the django benchmark FYI > > FYI, there's a recent thread up on the pypy ML where someone is complaining > about PyPy being substantially slower than CPython when running Django on > top of SQLite. Also note that PyPy doesn't implement Py3 yet, so the > benchmark results are not comparable anyway. Yes, sqlite is slow. It's also much faster in trunk than in 1.6 and there is an open ticket about it :) The "django" benchmark is just templating, so it does not involve a database. From greg.ewing at canterbury.ac.nz Sat Sep 3 01:46:58 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 03 Sep 2011 11:46:58 +1200 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp> <1314903285.3617.31.camel@localhost.localdomain> <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4E616AF2.3070408@canterbury.ac.nz> Terry Reedy wrote: > While it has apparently been criticized as 'conservative' (which is well > ought to be), it has been rather progressive in promoting changes such > as 'ph' to 'f' (fisica, fone) and dropping silent 'p' in leading 'psi' > (sicologia) and silent 's' in leading 'sci' (ciencia). I find it curious that pronunciation always seems to take precedence over spelling in campaigns like this. Nowadays, especially with the internet increasingly taking over from personal interaction, we probably see words written a lot more often than we hear them spoken. Why shouldn't we change the pronunciation to match the spelling rather than the other way around? -- Greg From stephen at xemacs.org Sat Sep 3 06:17:24 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 03 Sep 2011 13:17:24 +0900 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <4E616AF2.3070408@canterbury.ac.nz> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp> <1314903285.3617.31.camel@localhost.localdomain> <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp> <4E616AF2.3070408@canterbury.ac.nz> Message-ID: <87ehzyjm3v.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > I find it curious that pronunciation always seems to take > precedence over spelling in campaigns like this. Nowadays, > especially with the internet increasingly taking over from > personal interaction, we probably see words written a lot > more often than we hear them spoken. Why shouldn't we > change the pronunciation to match the spelling rather than > the other way around? Because 90% of all people move their lips when reading. :-) More seriously, because almost nobody learns to read before learning to understand spoken language. Aural language is more primitive than written language. From georg at python.org Sun Sep 4 22:21:50 2011 From: georg at python.org (Georg Brandl) Date: Sun, 04 Sep 2011 22:21:50 +0200 Subject: [Python-Dev] [RELEASED] Python 3.2.2 Message-ID: <4E63DDDE.2040108@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On behalf of the Python development team, I'm happy to announce the Python 3.2.2 maintenance release. Python 3.2.2 mainly fixes `a regression `_ in the ``urllib.request`` module that prevented opening many HTTP resources correctly with Python 3.2.1. Python 3.2 is a continuation of the efforts to improve and stabilize the Python 3.x line. Since the final release of Python 2.7, the 2.x line will only receive bugfixes, and new features are developed for 3.x only. Since PEP 3003, the Moratorium on Language Changes, is in effect, there are no changes in Python's syntax and built-in types in Python 3.2. Development efforts concentrated on the standard library and support for porting code to Python 3. Highlights are: * numerous improvements to the unittest module * PEP 3147, support for .pyc repository directories * PEP 3149, support for version tagged dynamic libraries * PEP 3148, a new futures library for concurrent programming * PEP 384, a stable ABI for extension modules * PEP 391, dictionary-based logging configuration * an overhauled GIL implementation that reduces contention * an extended email package that handles bytes messages * a much improved ssl module with support for SSL contexts and certificate hostname matching * a sysconfig module to access configuration information * additions to the shutil module, among them archive file support * many enhancements to configparser, among them mapping protocol support * improvements to pdb, the Python debugger * countless fixes regarding bytes/string issues; among them full support for a bytes environment (filenames, environment variables) * many consistency and behavior fixes for numeric operations For a more extensive list of changes in 3.2, see http://docs.python.org/3.2/whatsnew/3.2.html To download Python 3.2 visit: http://www.python.org/download/releases/3.2/ Please consider trying Python 3.2 with your code and reporting any bugs you may notice to: http://bugs.python.org/ Enjoy! - -- Georg Brandl, Release Manager georg at python.org (on behalf of the entire python-dev team and 3.2's contributors) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEARECAAYFAk5j3d4ACgkQN9GcIYhpnLA2BACeLZ8nSdVOoxlJw4DnbM42neeA fwAAoKTHetXsVxrEfvCWSorUhoJ083kZ =5Wm1 -----END PGP SIGNATURE----- From hagen at zhuliguan.net Mon Sep 5 04:19:04 2011 From: hagen at zhuliguan.net (=?ISO-8859-1?Q?Hagen_F=FCrstenau?=) Date: Sun, 04 Sep 2011 22:19:04 -0400 Subject: [Python-Dev] [RELEASED] Python 3.2.2 In-Reply-To: <4E63DDDE.2040108@python.org> References: <4E63DDDE.2040108@python.org> Message-ID: > To download Python 3.2 visit: > > http://www.python.org/download/releases/3.2/ It's a bit confusing that the download link is to 3.2 and not 3.2.2. Cheers, Hagen From tjreedy at udel.edu Mon Sep 5 05:41:20 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 04 Sep 2011 23:41:20 -0400 Subject: [Python-Dev] [RELEASED] Python 3.2.2 In-Reply-To: <4E63DDDE.2040108@python.org> References: <4E63DDDE.2040108@python.org> Message-ID: On 9/4/2011 4:21 PM, Georg Brandl wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On behalf of the Python development team, I'm happy to announce the > Python 3.2.2 maintenance release. > To download Python 3.2 visit: > > http://www.python.org/download/releases/3.2/ To download 3.2.2 visit: http://www.python.org/download/releases/3.2.2/ -- Terry Jan Reedy From g.brandl at gmx.net Mon Sep 5 08:36:44 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 05 Sep 2011 08:36:44 +0200 Subject: [Python-Dev] [RELEASED] Python 3.2.2 In-Reply-To: References: <4E63DDDE.2040108@python.org> Message-ID: Am 05.09.2011 04:19, schrieb Hagen F?rstenau: >> To download Python 3.2 visit: >> >> http://www.python.org/download/releases/3.2/ > > It's a bit confusing that the download link is to 3.2 and not 3.2.2. Indeed, sorry. Georg From fuzzyman at voidspace.org.uk Mon Sep 5 13:56:09 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 5 Sep 2011 12:56:09 +0100 Subject: [Python-Dev] Maintenance burden of str.swapcase Message-ID: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> Hey all, A while ago there was a discussion of the value of apis like str.swapcase, and it was suggested that even though it was acknowledged to be useless the effort of deprecating and removing it was thought to be more than the value in removing it. Earlier this year I was at a pypy sprint helping to work on Python 2.7 compatibility. The bytearray type has much of the string interface, including swapcase? So there was effort to implement this method with the correct semantics for pypy. Doubtless the same has been true for IronPython, and will also be true for Jython. Whilst it is too late for Python 2.x, it *is* (in my opinion) worth removing unused and unneeded APIs. Even if the effort to remove them is more than any effort saved on the part of users it helps other implementations down the road that no longer need to provide these APIs. All the best, Michael Foord -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From merwok at netwok.org Mon Sep 5 16:54:04 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Mon, 05 Sep 2011 16:54:04 +0200 Subject: [Python-Dev] [Python-checkins] cpython (3.2): #5301: add image/vnd.microsoft.icon (.ico) MIME type In-Reply-To: References: <4E50BF15.8020502@netwok.org> Message-ID: <4E64E28C.7000908@netwok.org> Hi, Le 21/08/2011 11:09, Sandro Tosi a ?crit : > On Sun, Aug 21, 2011 at 10:17, ?ric Araujo wrote: >> However small the commit was, I think it still was a feature request, so >> I wonder if it was appropriate for the stable versions. > > I can see your point: the reason I committed it also on the stable > branches is that .ico are already out there (since a long time) and > they were currently not recognized. I can call it a bug. > > Anyhow, if it was not appropriate, just tell me and I'll revert on 2.7 > and 3.2 . It should be reverted, yes, at least in 2.7. Apparently Georg has accepted and released the fix for 3.2.2. Regards From merwok at netwok.org Mon Sep 5 19:01:16 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Mon, 05 Sep 2011 19:01:16 +0200 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu> <4E5869C2.2040008@udel.edu> <8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com> <87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp> <20110829141440.2e2178c6@pitrou.net> <874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp> <1314724786.3554.1.camel@localhost.localdomain> <8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp> <87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp> <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp> <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp> <1314903285.3617.31.camel@localhost.localdomain> <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4E65005C.5070104@netwok.org> Le 02/09/2011 05:59, Stephen J. Turnbull a ?crit : > I believe that the deprecation of the digraphs as separate letters > occurred as the telephone became widely used in Spain, and the > telephone company demanded an official proclamation from whatever > Ministry is responsible for culture that it was OK to treat the > digraphs as two letters (specifically, to collate them that way), so > that they could use the programs that came with the OS. > > So this stuff is not merely variant by culture, but also by economics > and politics. :-/ That is a truth for language matters and linguistics, as well as in other domains and sciences. Cheers From ncoghlan at gmail.com Tue Sep 6 02:25:58 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 6 Sep 2011 10:25:58 +1000 Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character In-Reply-To: References: Message-ID: On Tue, Sep 6, 2011 at 10:01 AM, victor.stinner wrote: > Fix also spelling of the null character. While these cases are legitimately changed to 'null' (since they're lowercase descriptions of the character), I figure it's worth mentioning again that the ASCII name for '\0' actually *is* NUL (i.e. only one 'L'). Strange, but true [1]. Cheers, Nick. [1] https://secure.wikimedia.org/wikipedia/en/wiki/ASCII -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From jcea at jcea.es Tue Sep 6 06:09:01 2011 From: jcea at jcea.es (Jesus Cea) Date: Tue, 06 Sep 2011 06:09:01 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: <20110902201415.773da7d6@pitrou.net> References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> Message-ID: <4E659CDD.8090900@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/09/11 20:14, Antoine Pitrou wrote: > On Fri, 02 Sep 2011 19:53:37 +0200 Jesus Cea wrote: >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> >> On 02/09/11 17:57, Jesus Cea wrote: >>> The build hangs or die with a "out of memory" error, >>> eventually. >> >> A simple "make test" with python not compiled with "pydebug" and >> skipping all the optional tests (like zip64) is taking up to >> 300MB of RAM. Python 2.7 branch, current tip. > > Can you tell if it's something recent or it has always been like > that? I can't tell. My host has restricted me recently to 4GB RAM max (no swap), and the buildbot is failing now, but I don't know if using so much memory is something recent or not. Previously I could use up to 32GB of RAM. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmWc3Zlgi5GaxT1NAQKhGgP/U8f/NEk2WeNdEngasEDFxX1xSEzJMddo qIv7XkGXc93LNdGpqaIzNgW2d5NX3i7es0U5NrDtJVa0BTDLorKFN+zV6RpInZUO eQR65ZYn6Ld1xioyrb74v5vZq7HXcONhyVPcmXufRHkzkZ+kTnybvyc60plZEN5n NyHJkl7gNcU= =iNH7 -----END PGP SIGNATURE----- From ncoghlan at gmail.com Tue Sep 6 06:19:27 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 6 Sep 2011 14:19:27 +1000 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: <4E659CDD.8090900@jcea.es> References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> Message-ID: On Tue, Sep 6, 2011 at 2:09 PM, Jesus Cea wrote: >> Can you tell if it's something recent or it has always been like >> that? > > I can't tell. My host has restricted me recently to 4GB RAM max (no > swap), and the buildbot is failing now, but I don't know if using so > much memory is something recent or not. > > Previously I could use up to 32GB of RAM. Is it possible your buildbot is set up to run the bigmem tests? IIRC, those would work correctly with 32 GB, but die a horrible death with only 4 GB available. Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From jcea at jcea.es Tue Sep 6 06:27:41 2011 From: jcea at jcea.es (Jesus Cea) Date: Tue, 06 Sep 2011 06:27:41 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> Message-ID: <4E65A13D.9010805@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/09/11 06:19, Nick Coghlan wrote: > Is it possible your buildbot is set up to run the bigmem tests? > IIRC, those would work correctly with 32 GB, but die a horrible > death with only 4 GB available. How can I check that?. I am seen multiple python processes, quite a few, each taking around 300MB of RAM. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmWhPZlgi5GaxT1NAQLkFAP/YBJ5owdNdl2yiJMc8kVi4Ndjt5WK5aRa DY24wZvQP/wY1gOjWKGceTm5Mkhds1Y3qWnP4nW8l1nQNxj+xAdqc5SUQcBHQRVo 5xtC+gQQ1HqDUS4FhAn+IgvlXtnoT0cTfgRO2G7k0ti89KN79aCR+q52TSOy0VCW 1Spv9ilP1Rk= =Ffmz -----END PGP SIGNATURE----- From ncoghlan at gmail.com Tue Sep 6 06:46:26 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 6 Sep 2011 14:46:26 +1000 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: <4E65A13D.9010805@jcea.es> References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> Message-ID: On Tue, Sep 6, 2011 at 2:27 PM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 06/09/11 06:19, Nick Coghlan wrote: >> Is it possible your buildbot is set up to run the bigmem tests? >> IIRC, those would work correctly with 32 GB, but die a horrible >> death with only 4 GB available. > > How can I check that?. > > I am seen multiple python processes, quite a few, each taking around > 300MB of RAM. The test logs include the exact command that is executed: http://www.python.org/dev/buildbot/all/builders/AMD64%20OpenIndiana%203.x/builds/1731/steps/test/logs/stdio So it looks like you're just running the standard test resource (which makes sense, since the bigmem tests would saturate your system with a single process rather than multiple processes). The server actually looks it may be in a generally unhappy state, perhaps due to previous builds that failed without cleaning up after themselves properly. How many python processes do you see hanging around? Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From jcea at jcea.es Tue Sep 6 06:59:26 2011 From: jcea at jcea.es (Jesus Cea) Date: Tue, 06 Sep 2011 06:59:26 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> Message-ID: <4E65A8AE.10900@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/09/11 06:46, Nick Coghlan wrote: > The server actually looks it may be in a generally unhappy state, > perhaps due to previous builds that failed without cleaning up > after themselves properly. How many python processes do you see > hanging around? Just now: """ PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 4340 buildbot 366M 344M sleep 1 0 0:11:23 0.0% python/2 10097 buildbot 366M 8096K sleep 25 0 0:00:00 0.0% python/1 10099 buildbot 366M 8100K sleep 25 0 0:00:00 0.0% python/1 10098 buildbot 366M 8108K sleep 26 0 0:00:00 0.0% python/1 27698 buildbot 251M 5244K sleep 1 0 0:00:00 0.0% python/1 27697 buildbot 251M 11M sleep 1 0 0:00:00 0.0% python/1 27695 buildbot 251M 5852K sleep 1 0 0:00:00 0.0% python/1 27694 buildbot 251M 5844K sleep 1 0 0:00:00 0.0% python/1 27696 buildbot 251M 5884K sleep 1 0 0:00:00 0.0% python/1 27693 buildbot 251M 5964K sleep 1 0 0:00:00 0.0% python/1 9893 buildbot 202M 198M sleep 1 1 0:09:32 0.0% python/2 14538 buildbot 200M 4700K sleep 1 1 0:00:00 0.0% python/1 25971 buildbot 194M 189M sleep 10 0 0:11:22 0.0% python/2 2616 buildbot 120M 114M sleep 1 0 0:06:38 0.0% python/47 11204 buildbot 118M 5612K sleep 1 0 0:00:00 0.0% python/2 ZONEID NPROC SWAP RSS MEMORY TIME CPU ZONE 23 56 4073M 1632M 40% 0:39:38 0.0% pythonbuildbot.uk.openindiana.org """ This particular build seems to have hang, usual result of running out of memory. Note the SWAP usage of 4073MB, when my limit seems to be 4096MB. The buildbot master process will kill this "hang" processes after the usual timeout. I have requested raising my memory limit to my host, with no effect so far. Anyway, eating >4GB of RAM seems quite overkill. Doing a "make test" manually I can see the python process doing the test to eat more than 200MB of RAM, but it only launch a python process, not a handful like the regular buildbot. I have verified that the memory use is atribuible to the buildbot, since if I kill the buildbot processes, my RAM+SWAP usage is negligible. Thanking for helping me with this. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmWorplgi5GaxT1NAQIb7QP/Y/Mr0RhhRTM1Rld7xKqNi77tcB0+p4CX EZ0fViNr/NF6NibKMzowi0pr42iZ3dXN4/yRQgNsvGhfzTrpi+J3Z1GCg5vnqox3 jOC+DQ5IrZylLV+zH46K9j2UJ+4hvU3PWBZcGAt6iB4EVK1h8mvBBW08VeDoN5Cj Nkqth694BcY= =KAwa -----END PGP SIGNATURE----- From jcea at jcea.es Tue Sep 6 07:02:13 2011 From: jcea at jcea.es (Jesus Cea) Date: Tue, 06 Sep 2011 07:02:13 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: <4E65A8AE.10900@jcea.es> References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> Message-ID: <4E65A955.7000507@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/09/11 06:59, Jesus Cea wrote: > Thanking for helping me with this. BTW, it is 7AM in Spain now. I am going bed. I will check this thread again tomorrow. Thanks for your time and effort. This is very frustrating, moreover because it was working very well (with 32GB of RAM... :-). - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmWpVZlgi5GaxT1NAQIz/wP5AYXGp6DYf0Fpl0tNHx8sLNJXR8XSQFjf YRoUvmo1Sh60eMU7yGsoyT2wvOTzU4rPgaWoFsaUELS/74rLMcmb567kKAJqpH7X 8BNmNSdRxYxMXixUrrwi25rYTEgz4ZenpV8tjkHR+wHhcCbBvKnDxcliJZkAxDAJ mzlhdQvdPgI= =9wQO -----END PGP SIGNATURE----- From jcea at jcea.es Tue Sep 6 07:19:20 2011 From: jcea at jcea.es (Jesus Cea) Date: Tue, 06 Sep 2011 07:19:20 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: <4E65A955.7000507@jcea.es> References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es> Message-ID: <4E65AD58.6050106@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/09/11 07:02, Jesus Cea wrote: > On 06/09/11 06:59, Jesus Cea wrote: >> Thanking for helping me with this. > > BTW, it is 7AM in Spain now. I am going bed. I will check this > thread again tomorrow. Thanks for your time and effort. This is > very frustrating, moreover because it was working very well (with > 32GB of RAM... :-). I just deleted all the build directories and restarted the buildbots. Forcing a build now. Bedtime. Good night. At this moment, I have 3 Python processes, of sizes 230, 160 and 130 MB. And growing. Sleeping... Zzzzz... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmWtWJlgi5GaxT1NAQJ0uAQAhmOiXf6lxZeqiRldZcYvYXxnBDw4wNKJ ulADNvqJY7dxFPvuUZ8gv9zQcBjs+xTcY3IkDL4ZlSvubMZeR0O7mQ09zvBKXezd PI6vIK59PPeY+Znfw29TCDB8x5As2wqLVh388eLlYyJFsuUiZfOr4KuCwRughDns cJ7XJ4lb2+c= =oRzC -----END PGP SIGNATURE----- From ncoghlan at gmail.com Tue Sep 6 07:27:57 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 6 Sep 2011 15:27:57 +1000 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: <4E65AD58.6050106@jcea.es> References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es> <4E65AD58.6050106@jcea.es> Message-ID: On Tue, Sep 6, 2011 at 3:19 PM, Jesus Cea wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 06/09/11 07:02, Jesus Cea wrote: >> On 06/09/11 06:59, Jesus Cea wrote: >>> Thanking for helping me with this. >> >> BTW, it is 7AM in Spain now. I am going bed. I will check this >> thread again tomorrow. Thanks for your time and effort. This is >> very frustrating, moreover because it was working very well (with >> 32GB of RAM... :-). > > I just deleted all the build directories and restarted the buildbots. > Forcing a build now. Bedtime. Good night. > > At this moment, I have 3 Python processes, of sizes 230, 160 and 130 > MB. And growing. The memory usage per process seems reasonable to me, based on what I see on my own machine. That means it's the 15 processes that's problematic. It will be interesting to see how these current test runs go. It may be the case that with the reduced memory limit, your machine may not be able to run concurrent slaves for 2.7, 3.2 and 3.x as I believe it does now. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Tue Sep 6 07:50:19 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 6 Sep 2011 07:50:19 +0200 Subject: [Python-Dev] cpython: Issue #12567: Add curses.unget_wch() function References: Message-ID: <20110906075019.2d16f1b1@pitrou.net> On Tue, 06 Sep 2011 01:53:32 +0200 victor.stinner wrote: > http://hg.python.org/cpython/rev/b1e03d10391e > changeset: 72297:b1e03d10391e > user: Victor Stinner > date: Tue Sep 06 01:53:03 2011 +0200 > summary: > Issue #12567: Add curses.unget_wch() function > > Push a character so the next get_wch() will return it. Looks like you broke many buildbots. Regards Antoine. From solipsis at pitrou.net Tue Sep 6 09:33:59 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 6 Sep 2011 07:33:59 +0000 (UTC) Subject: [Python-Dev] Maintenance burden of str.swapcase References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> Message-ID: Michael Foord voidspace.org.uk> writes: > > Earlier this year I was at a pypy sprint helping to work on Python 2.7 compatibility. The bytearray type has much of the string interface, including swapcase? So there was effort to implement this method with the correct semantics for pypy. Doubtless the same has been true for IronPython, and will also be true for Jython. While I haven't used swapcase() a single time, I doubt there is much difficult in implementing pure ASCII semantics, is there? Regards Antoine. From victor.stinner at haypocalc.com Tue Sep 6 10:10:31 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 06 Sep 2011 10:10:31 +0200 Subject: [Python-Dev] cpython: Issue #12567: Add curses.unget_wch() function In-Reply-To: <20110906075019.2d16f1b1@pitrou.net> References: <20110906075019.2d16f1b1@pitrou.net> Message-ID: <4E65D577.4060707@haypocalc.com> Le 06/09/2011 07:50, Antoine Pitrou a ?crit : > On Tue, 06 Sep 2011 01:53:32 +0200 > victor.stinner wrote: >> http://hg.python.org/cpython/rev/b1e03d10391e >> changeset: 72297:b1e03d10391e >> user: Victor Stinner >> date: Tue Sep 06 01:53:03 2011 +0200 >> summary: >> Issue #12567: Add curses.unget_wch() function >> >> Push a character so the next get_wch() will return it. > > Looks like you broke many buildbots. Oh, thanks to notify me. I expected failures, but I also forgot the skip the test if the function is missing. I wrote an huge patch for this module to improve Unicode support, but I chose to split it into smaller patches. Because a single function broke most buildbots, it was a good idea :-) Victor From victor.stinner at haypocalc.com Tue Sep 6 10:04:57 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 06 Sep 2011 10:04:57 +0200 Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character In-Reply-To: References: Message-ID: <4E65D429.3040908@haypocalc.com> Le 06/09/2011 02:25, Nick Coghlan a ?crit : > On Tue, Sep 6, 2011 at 10:01 AM, victor.stinner > wrote: >> Fix also spelling of the null character. > > While these cases are legitimately changed to 'null' (since they're > lowercase descriptions of the character), I figure it's worth > mentioning again that the ASCII name for '\0' actually *is* NUL (i.e. > only one 'L'). Strange, but true [1]. > > Cheers, > Nick. > > [1] https://secure.wikimedia.org/wikipedia/en/wiki/ASCII "NUL" is an abbreviation used in tables when you don't have enough space to write the full name: "null character". Where do you want to mention this abbreviation? Victor From ncoghlan at gmail.com Tue Sep 6 11:16:38 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 6 Sep 2011 19:16:38 +1000 Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character In-Reply-To: <4E65D429.3040908@haypocalc.com> References: <4E65D429.3040908@haypocalc.com> Message-ID: On Tue, Sep 6, 2011 at 6:04 PM, Victor Stinner wrote: > "NUL" is an abbreviation used in tables when you don't have enough space to > write the full name: "null character". Yep, fair description. > Where do you want to mention this abbreviation? Sorry, I meant worth mentioning on the list, not anywhere particular in the docs - the topic came up recently when an instance of NUL was incorrectly changed to read 'NULL' instead and it took me a moment to figure out why the same reasoning *didn't* apply in this case. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From martin at v.loewis.de Tue Sep 6 15:03:32 2011 From: martin at v.loewis.de (martin at v.loewis.de) Date: Tue, 06 Sep 2011 15:03:32 +0200 Subject: [Python-Dev] bigmemtests for really big memory too slow Message-ID: <20110906150332.Horde.boB6BaGZi1VOZhok0Q6zPZA@webmail.df.eu> I benchmarked some of the bigmemtests when run with -M 80G. They run really slow, because they try to use all available memory, and then take a lot of time processing it. Here are some runtimes: test_capitalize (test.test_bigmem.StrTest) ... ok (420.490846s) test_center (test.test_bigmem.StrTest) ... ok (149.431523s) test_compare (test.test_bigmem.StrTest) ... ok (200.181986s) test_concat (test.test_bigmem.StrTest) ... ok (154.282903s) test_contains (test.test_bigmem.StrTest) ... ok (173.960073s) test_count (test.test_bigmem.StrTest) ... ok (186.799731s) test_encode (test.test_bigmem.StrTest) ... ok (53.752823s) test_encode_ascii (test.test_bigmem.StrTest) ... ok (8.421414s) test_encode_raw_unicode_escape (test.test_bigmem.StrTest) ... ok (3.752774s) test_encode_utf32 (test.test_bigmem.StrTest) ... ok (9.732829s) test_encode_utf7 (test.test_bigmem.StrTest) ... ok (4.998805s) test_endswith (test.test_bigmem.StrTest) ... ok (208.022452s) test_expandtabs (test.test_bigmem.StrTest) ... ok (614.490436s) test_find (test.test_bigmem.StrTest) ... ok (230.722848s) test_format (test.test_bigmem.StrTest) ... ok (407.471929s) test_hash (test.test_bigmem.StrTest) ... ok (325.906271s) In the test suite, we have the bigmemtest and precisionbigmemtest decorators. I think bigmemtest cases should all be changed to precisionbigmemtest, giving sizes of just above 2**31. With that change, the runtime for test_capitalize would go down to 42s. What do you think? Regards, Martin From solipsis at pitrou.net Tue Sep 6 15:27:38 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 6 Sep 2011 15:27:38 +0200 Subject: [Python-Dev] bigmemtests for really big memory too slow References: <20110906150332.Horde.boB6BaGZi1VOZhok0Q6zPZA@webmail.df.eu> Message-ID: <20110906152738.733f98cd@pitrou.net> Hello Martin, > In the test suite, we have the bigmemtest and precisionbigmemtest > decorators. I think bigmemtest cases should all be changed to > precisionbigmemtest, giving sizes of just above 2**31. With that > change, the runtime for test_capitalize would go down to 42s. I have started working on this and other things in http://hg.python.org/sandbox/antoine/, branch "bigmem". I was planning to propose the same thing, which indeed makes tests pass much more quickly, but I was waiting to try and solve some other crashes in test_bigmem. Regards Antoine. From jsbueno at python.org.br Tue Sep 6 16:32:13 2011 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 6 Sep 2011 11:32:13 -0300 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> Message-ID: On Mon, Sep 5, 2011 at 8:56 AM, Michael Foord wrote: > Hey all, > A while ago there was a discussion of the value of apis like str.swapcase, > and it was suggested that even though it was acknowledged to be useless the > effort of deprecating and removing it was thought to be more than the value > in removing it. > Earlier this year I was at a pypy sprint helping to work on Python 2.7 > compatibility. The bytearray type has much of the string interface, > including swapcase? So there was effort to implement this method with the > correct semantics for pypy. Doubtless the same has been true for IronPython, > and will also be true for Jython. > Whilst it is too late for Python 2.x, it *is* (in my opinion) worth removing > unused and unneeded APIs. Even if the effort to remove them is more than any > effort saved on the part of users it helps other implementations down the > road that no longer need to provide these APIs. > All the best, > Michael Foord > On the other hand, for any users wanting to use this i n the future, if it is not there, they'd have to implement the logic for themselves. If it is a "burden" for someone in a sprint, looking at other implementations, and with all the unicode knowledge/documentation around, it would be pretty much undoable in the correct way by a casual user. Removing it would mean explicitly "batteries removal". If you get some traction o n that, at least consider moving it to a pure python function on the string module. js -><- > -- > http://www.voidspace.org.uk/ > > May you do good and not evil > May you find forgiveness for yourself and forgive others > May you share freely, never taking more than you give. > -- the sqlite blessing http://www.sqlite.org/different.html > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br > > From solipsis at pitrou.net Tue Sep 6 16:46:37 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 6 Sep 2011 16:46:37 +0200 Subject: [Python-Dev] bigmemtests for really big memory too slow References: <20110906150332.Horde.boB6BaGZi1VOZhok0Q6zPZA@webmail.df.eu> Message-ID: <20110906164637.0aaa5e10@pitrou.net> For the record, I've disabled automatic builds on the bigmem buildbot until things get sorted out a bit (no need to eat huge amounts of RAM and eight hours of CPU each time a commit is pushed, only to have the process killed :-)). It's still possible to run custom builds, of course. Regards Antoine. On Tue, 06 Sep 2011 15:03:32 +0200 martin at v.loewis.de wrote: > I benchmarked some of the bigmemtests when run with -M 80G. They run really > slow, because they try to use all available memory, and then take a lot of > time processing it. Here are some runtimes: > > test_capitalize (test.test_bigmem.StrTest) ... ok (420.490846s) > test_center (test.test_bigmem.StrTest) ... ok (149.431523s) > test_compare (test.test_bigmem.StrTest) ... ok (200.181986s) > test_concat (test.test_bigmem.StrTest) ... ok (154.282903s) > test_contains (test.test_bigmem.StrTest) ... ok (173.960073s) > test_count (test.test_bigmem.StrTest) ... ok (186.799731s) > test_encode (test.test_bigmem.StrTest) ... ok (53.752823s) > test_encode_ascii (test.test_bigmem.StrTest) ... ok (8.421414s) > test_encode_raw_unicode_escape (test.test_bigmem.StrTest) ... ok (3.752774s) > test_encode_utf32 (test.test_bigmem.StrTest) ... ok (9.732829s) > test_encode_utf7 (test.test_bigmem.StrTest) ... ok (4.998805s) > test_endswith (test.test_bigmem.StrTest) ... ok (208.022452s) > test_expandtabs (test.test_bigmem.StrTest) ... ok (614.490436s) > test_find (test.test_bigmem.StrTest) ... ok (230.722848s) > test_format (test.test_bigmem.StrTest) ... ok (407.471929s) > test_hash (test.test_bigmem.StrTest) ... ok (325.906271s) > > In the test suite, we have the bigmemtest and precisionbigmemtest > decorators. I think bigmemtest cases should all be changed to > precisionbigmemtest, giving sizes of just above 2**31. With that > change, the runtime for test_capitalize would go down to 42s. > > What do you think? > > Regards, > Martin > > > From tseaver at palladion.com Tue Sep 6 17:11:51 2011 From: tseaver at palladion.com (Tres Seaver) Date: Tue, 06 Sep 2011 11:11:51 -0400 Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character In-Reply-To: <4E65D429.3040908@haypocalc.com> References: <4E65D429.3040908@haypocalc.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/06/2011 04:04 AM, Victor Stinner wrote: > Le 06/09/2011 02:25, Nick Coghlan a ?crit : >> On Tue, Sep 6, 2011 at 10:01 AM, victor.stinner >> wrote: >>> Fix also spelling of the null character. >> >> While these cases are legitimately changed to 'null' (since >> they're lowercase descriptions of the character), I figure it's >> worth mentioning again that the ASCII name for '\0' actually *is* >> NUL (i.e. only one 'L'). Strange, but true [1]. >> >> Cheers, Nick. >> >> [1] https://secure.wikimedia.org/wikipedia/en/wiki/ASCII > > "NUL" is an abbreviation used in tables when you don't have enough > space to write the full name: "null character". > > Where do you want to mention this abbreviation? FWIW, the RFC 20 (the ASCII spec) really really defines 'NUL' as the *name* of the \0 character, not just an "abbreviation used in tables": http://tools.ietf.org/html/rfc20#section-5.2 Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5mODcACgkQ+gerLs4ltQ7VwACgicaURzX4wAWOi+sRYGBwF5/3 8okAniSkHIlBv/VoibW6klR3WgD8T3ph =LlKo -----END PGP SIGNATURE----- From merwok at netwok.org Tue Sep 6 17:17:47 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Tue, 06 Sep 2011 17:17:47 +0200 Subject: [Python-Dev] [Python-checkins] cpython: Issue #9561: packaging now writes egg-info files using UTF-8 In-Reply-To: References: Message-ID: <4E66399B.4030006@netwok.org> Le 06/09/2011 00:11, victor.stinner a ?crit : > http://hg.python.org/cpython/rev/56ab3257ca13 > changeset: 72296:56ab3257ca13 > user: Victor Stinner > date: Tue Sep 06 00:11:13 2011 +0200 > summary: > Issue #9561: packaging now writes egg-info files using UTF-8 > > instead of the locale encoding > > def _distutils_pkg_info(self): > tmp = self._distutils_setup_py_pkg() > - self.write_file([tmp, 'PKG-INFO'], '') > + self.write_file([tmp, 'PKG-INFO'], '', encoding='UTF-8') This function is writing an empty string; isn?t it the same bytes in UTF-8 or in the locale encoding? (Are there people that use encodings with BOMs as locale? *shudders*) From victor.stinner at haypocalc.com Tue Sep 6 17:50:31 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 06 Sep 2011 17:50:31 +0200 Subject: [Python-Dev] [Python-checkins] cpython: Issue #9561: packaging now writes egg-info files using UTF-8 In-Reply-To: <4E66399B.4030006@netwok.org> References: <4E66399B.4030006@netwok.org> Message-ID: <4E664147.8010407@haypocalc.com> Le 06/09/2011 17:17, ?ric Araujo a ?crit : > Le 06/09/2011 00:11, victor.stinner a ?crit : >> http://hg.python.org/cpython/rev/56ab3257ca13 >> changeset: 72296:56ab3257ca13 >> user: Victor Stinner >> date: Tue Sep 06 00:11:13 2011 +0200 >> summary: >> Issue #9561: packaging now writes egg-info files using UTF-8 >> >> instead of the locale encoding > >> >> def _distutils_pkg_info(self): >> tmp = self._distutils_setup_py_pkg() >> - self.write_file([tmp, 'PKG-INFO'], '') >> + self.write_file([tmp, 'PKG-INFO'], '', encoding='UTF-8') > > This function is writing an empty string; isn?t it the same bytes in > UTF-8 or in the locale encoding? This patch is just cosmetic: it doesn't change anything (except that TextIOWrapper doesn't have to change temporary the locale to get the locale encoding). Victor From stephen at xemacs.org Tue Sep 6 18:59:56 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 07 Sep 2011 01:59:56 +0900 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> Message-ID: <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> Joao S. O. Bueno writes: > Removing it would mean explicitly "batteries removal". That's what we usually do with a dead battery, no? From tseaver at palladion.com Tue Sep 6 18:58:07 2011 From: tseaver at palladion.com (Tres Seaver) Date: Tue, 06 Sep 2011 12:58:07 -0400 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote: > Joao S. O. Bueno writes: > >> Removing it would mean explicitly "batteries removal". > > That's what we usually do with a dead battery, no? Normally one "replaces" dead batteries. :) Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk5mUR8ACgkQ+gerLs4ltQ7Y3gCgzRdR3Vjc/i7KsC3S0OFxRi1I r3sAoMzmSxot9+k5EnatZ8RYvFnhPO5B =PNN1 -----END PGP SIGNATURE----- From tjreedy at udel.edu Tue Sep 6 19:55:24 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 06 Sep 2011 13:55:24 -0400 Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character In-Reply-To: References: <4E65D429.3040908@haypocalc.com> Message-ID: On 9/6/2011 11:11 AM, Tres Seaver wrote: > FWIW, the RFC 20 (the ASCII spec) really really defines 'NUL' as the > *name* of the \0 character, not just an "abbreviation used in tables": > > http://tools.ietf.org/html/rfc20#section-5.2 As I read the text, the 2 or 3 capital letter *symbols* are abbreviations of of the names. Looking back up, I see ''' 4. Legend 4.1 Control Characters NUL Null DLE Data Link Escape (CC) ... 4.2 Graphic Characters Column/Row Symbol Name 2/0 SP Space (Normally Non-Printing) 2/1 ! Exclamation Point ''' 'NUL' and 'SP' are *symbols* that have the names 'Null' and 'Space', just as the symbol '!' is named 'Exclamation Point'. They just happen to be digraphs and trigraphs composed of 2 or 3 characters. I am sure that the symbol SP does not appear in the docs. The symbol 'LF' (for LineFeed) probably does not either. We just call it 'newline' or 'newline character' as that is how we use it. -- Terry Jan Reedy From tjreedy at udel.edu Tue Sep 6 21:05:16 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 06 Sep 2011 15:05:16 -0400 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 9/6/2011 12:58 PM, Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote: >> Joao S. O. Bueno writes: >> >>> Removing it would mean explicitly "batteries removal". >> >> That's what we usually do with a dead battery, no? > > Normally one "replaces" dead batteries. :) Not if it is dead and leaking because the device has been unused for years. https://www.google.com/codesearch#search/&q=lang:^python$%20swapcase%20case:yes&type=cs returns a mere 300 hits. At least half are definitions of the function, or tests thereof, or inclusions in lists. Some actual uses: 1.http://pytof.googlecode.com/svn/trunk/pytof/utils.py def ListCurrentDirFileFromExt(ext, path): """ list file matching extension from a list in the current directory emulate a `ls *.{(',').join(ext)` with ext in both upper and downcase}""" import glob extfiles = [] for e in ext: extfiles.extend(glob.glob(join(path,'*' + e))) extfiles.extend(glob.glob(join(path,'*' + e.swapcase()))) If e is all upper or lower, using e.upper() and e.lower() will do same. If e is mixed, using .upper and .lower is required to fulfill the spec. On *nix, where matching of letters is case sensitive, both will fail with '.Jpg'. On Windows, where letter matching ignores case, the above code will list everything twice. 2.http://ydict.googlecode.com/svn/trunk/ydict k is random word from database. result.replace(k, "####").replace(k.upper(), "####").replace(k[0].swapcase()+k[1:].lower(),"####") If k is lowercase, .lower() is redundant and k[0].swapcase()+k[1:].lower() == k.title(). If k is uppercase, previous .upper() is redundant. If k is mixed case, code may have problems. 3. http://migrid.googlecode.com/svn/trunk/mig/sftp-mount/migaccess.py # This is how we could add stub extended attribute handlers... # (We can't have ones which aptly delegate requests to the underlying fs # because Python lacks a standard xattr interface.) # # def getxattr(self, path, name, size): # val = name.swapcase() + '@' + path # if size == 0: # # We are asked for size of the value. # return len(val) # return val This is not actually used. Passing a name with all cases swapped from what they should be is a bit strange. 4. elif char >= 'A' and char <= 'Z': element = element + char.swapcase() uppercasechar.swapcase() == uppercasechar.lower() My perusal of the first 70 of 300 hits suggests that .swapcase is more of an attractive nuisance or redundant rather than actually useful. -- Terry Jan Reedy From steve at pearwood.info Tue Sep 6 21:36:27 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 07 Sep 2011 05:36:27 +1000 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4E66763B.7080707@pearwood.info> Terry Reedy wrote: > On 9/6/2011 12:58 PM, Tres Seaver wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote: >>> Joao S. O. Bueno writes: >>> >>>> Removing it would mean explicitly "batteries removal". >>> >>> That's what we usually do with a dead battery, no? >> >> Normally one "replaces" dead batteries. :) > > Not if it is dead and leaking because the device has been unused for years. Can we please not make decisions about what code should be removed based on dodgy analogies? :) Perhaps I missed something early on, but why are we proposing removing a function which (presumably) is stable and tested and works and is not broken? What maintenance is needed here? [...] > If k is lowercase, .lower() is redundant and > k[0].swapcase()+k[1:].lower() == k.title(). Not so. >>> k = 'aaaa bbbb' >>> k.title() 'Aaaa Bbbb' >>> k[0].swapcase()+k[1:].lower() 'Aaaa bbbb' > If k is uppercase, previous > .upper() is redundant. If k is mixed case, code may have problems. "May" have problems? pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS. -- Steven From fuzzyman at voidspace.org.uk Tue Sep 6 21:41:07 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 6 Sep 2011 20:41:07 +0100 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <4E66763B.7080707@pearwood.info> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> Message-ID: On 6 Sep 2011, at 20:36, Steven D'Aprano wrote: > Terry Reedy wrote: >> On 9/6/2011 12:58 PM, Tres Seaver wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote: >>>> Joao S. O. Bueno writes: >>>> >>>>> Removing it would mean explicitly "batteries removal". >>>> >>>> That's what we usually do with a dead battery, no? >>> >>> Normally one "replaces" dead batteries. :) >> Not if it is dead and leaking because the device has been unused for years. > > > Can we please not make decisions about what code should be removed based on dodgy analogies? :) > > Perhaps I missed something early on, but why are we proposing removing a function which (presumably) is stable and tested and works and is not broken? What maintenance is needed here? The maintenance burden is on other implementations. Even if there is no maintenance burden for CPython having useless methods simply because it is less effort to leave them in place creates work for new implementations wanting to be fully compatible. > > > [...] >> If k is lowercase, .lower() is redundant and k[0].swapcase()+k[1:].lower() == k.title(). > > Not so. > > >>> k = 'aaaa bbbb' > >>> k.title() > 'Aaaa Bbbb' > >>> k[0].swapcase()+k[1:].lower() > 'Aaaa bbbb' > > >> If k is uppercase, previous .upper() is redundant. If k is mixed case, code may have problems. > > "May" have problems? > > > pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS. Have you ever used str.swapcase for that purpose? Michael -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From fdrake at acm.org Tue Sep 6 21:42:03 2011 From: fdrake at acm.org (Fred Drake) Date: Tue, 6 Sep 2011 15:42:03 -0400 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <4E66763B.7080707@pearwood.info> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> Message-ID: On Tue, Sep 6, 2011 at 3:36 PM, Steven D'Aprano wrote: > pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR > APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS. There's a better solution to that, but the caps lock lobby has a stranglehold on keyboard manufacturers. -- Fred L. Drake, Jr.? ? "A person who won't read has no advantage over one who can't read." ?? --Samuel Langhorne Clemens From barry at python.org Tue Sep 6 22:03:50 2011 From: barry at python.org (Barry Warsaw) Date: Tue, 6 Sep 2011 16:03:50 -0400 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> Message-ID: <20110906160350.26237e64@resist.wooz.org> On Sep 06, 2011, at 03:42 PM, Fred Drake wrote: >On Tue, Sep 6, 2011 at 3:36 PM, Steven D'Aprano wrote: >> pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR >> APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS. > >There's a better solution to that, but the caps lock lobby has a stranglehold >on keyboard manufacturers. Fight The Man with xmodmap! -Barry From martin at v.loewis.de Tue Sep 6 22:18:49 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 06 Sep 2011 22:18:49 +0200 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> Message-ID: <4E668029.6080106@v.loewis.de> >> Perhaps I missed something early on, but why are we proposing >> removing a function which (presumably) is stable and tested and >> works and is not broken? What maintenance is needed here? > > > The maintenance burden is on other implementations. It's not a maintenance burden (at least not in the sense in which I understand the word "maintenance" - as an ongoing effort). When they implement it once, the implementation can likely stay forever, unmodified. > Even if there is > no maintenance burden for CPython having useless methods simply > because it is less effort to leave them in place creates work for > new implementations wanting to be fully compatible. That's true. However, that alone is not enough reason to remove the feature, IMO. The effort that is saved is not only on the developers of CPython, but also on users of the feature. My claim is that for any little-used feature, removing it costs more time world-wide than re-implementing it in 10 alternative Python implementations (with the number 10 drawn out of blue air), because of the cost of changing the applications that actually do use the feature. With the switch to Python 3, there would have been a chance to remove little-used features. IMO, the next such chance is with Python 4. It could be useful to start collecting little-used features that might be removed with Python 4 - which I don't expect until 2020. Regards, Martin From fuzzyman at voidspace.org.uk Tue Sep 6 22:23:26 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 6 Sep 2011 21:23:26 +0100 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <4E668029.6080106@v.loewis.de> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> Message-ID: <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> On 6 Sep 2011, at 21:18, Martin v. L?wis wrote: >>> Perhaps I missed something early on, but why are we proposing >>> removing a function which (presumably) is stable and tested and >>> works and is not broken? What maintenance is needed here? >> >> >> The maintenance burden is on other implementations. > > It's not a maintenance burden (at least not in the sense in which > I understand the word "maintenance" - as an ongoing effort). When > they implement it once, the implementation can likely stay forever, > unmodified. Ok, burden rather than "maintenance" burden. > >> Even if there is >> no maintenance burden for CPython having useless methods simply >> because it is less effort to leave them in place creates work for >> new implementations wanting to be fully compatible. > > That's true. > > However, that alone is not enough reason to remove the feature, IMO. > The effort that is saved is not only on the developers of CPython, > but also on users of the feature. My claim is that for any little-used > feature, removing it costs more time world-wide than re-implementing > it in 10 alternative Python implementations (with the number 10 drawn > out of blue air), because of the cost of changing the applications that > actually do use the feature. > Which applications? I'm not sure the number of applications using str.swapcase gets even as high as ten. > With the switch to Python 3, there would have been a chance to remove > little-used features. IMO, the next such chance is with Python 4. > It could be useful to start collecting little-used features that might > be removed with Python 4 - which I don't expect until 2020. We still have our standard deprecation policy that we can follow in Python 3. We don't have to wait until Python 4 to remove things. Changing semantics or syntax is harder because you can't really deprecate. Just removing methods is straightforward. MIchael > > Regards, > Martin > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From martin at v.loewis.de Tue Sep 6 22:36:47 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 06 Sep 2011 22:36:47 +0200 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> Message-ID: <4E66845F.3060708@v.loewis.de> > Which applications? I'm not sure the number of applications using > str.swapcase gets even as high as ten. I think this is what people underestimate. I can't name applications either - but that doesn't mean they don't exist. I'm deeply convinced that the majority of Python code (and I mean *large* majority) is unpublished. I expect thousands of uses world-wide. > We still have our standard deprecation policy that we can follow in > Python 3. We don't have to wait until Python 4 to remove things. That's true. However, part of the deprecation procedure is also that there should be a rationale for removing it. In the past, things have been removed that had been superseded with something new, or things that had been flawed in their design so that fixing it wasn't really possible, or that did indeed cause ongoing maintenance effort for a minority of users (such as the support for little-used platforms). None if these motivations hold for str.swapcase, and I think the "other implementations will have to implement it" is not sufficient motivation. If the other implementations believe that the feature is truly useless and also not used, they just can declare it a deliberate deviation from CPython, and refuse to implement it. If I had to pick a truly useless feature, I'd kill complex numbers, not str.swapcase. Regards, Martin From raymond.hettinger at gmail.com Tue Sep 6 23:23:37 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 6 Sep 2011 14:23:37 -0700 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <4E66845F.3060708@v.loewis.de> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> Message-ID: <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> On Sep 6, 2011, at 1:36 PM, Martin v. L?wis wrote: > I think this is what people underestimate. I can't name > applications either - but that doesn't mean they don't exist. Google code search is pretty good indicator that this method has near zero uptake. If it dies, I don't think anyone will cry. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Sep 7 02:24:18 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 07 Sep 2011 12:24:18 +1200 Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character In-Reply-To: <4E65D429.3040908@haypocalc.com> References: <4E65D429.3040908@haypocalc.com> Message-ID: <4E66B9B2.2070508@canterbury.ac.nz> Victor Stinner wrote: > "NUL" is an abbreviation used in tables when you don't have enough space > to write the full name: "null character". It's also the official name of the character, for when you want to be unambiguous about what you mean (e.g. "null character" as opposed to "empty string" or "null pointer"). I expect it's 3 chars for consistency with all the other control character names. -- Greg From ncoghlan at gmail.com Wed Sep 7 02:47:16 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 7 Sep 2011 10:47:16 +1000 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> Message-ID: On Wed, Sep 7, 2011 at 7:23 AM, Raymond Hettinger wrote: > > On Sep 6, 2011, at 1:36 PM, Martin v. L?wis wrote: > > I think this is what people underestimate. I can't name > applications either - but that doesn't mean they don't exist. > > Google code search is pretty good indicator that this method > has near zero uptake. ? If it dies, I don't think anyone will cry. For str itself, I'm -0 on removing it - the Unicode implications mean implementation isn't completely trivial and there's at least one legitimate use case (i.e. providing, or deliberately reversing, Caps Lock style functionality). However, a big +1 for deprecation in the case of bytes and bytearray. That's nothing to do with the maintenance burden though, it's to do with the semantic confusion between binary data and ASCII-encoded text implied by the retention of methods like upper(), lower() and swapcase(). Specifically, the methods I consider particularly problematic on that front are: 'capitalize' 'islower' 'istitle' 'isupper' 'lower' 'swapcase' 'title' 'upper' These are all text operations, not something you do with binary data. There are some other methods that make ASCII specific default assumptions regarding whitespace and line separators, but ASCII whitespace is often used as a delimiter in wire protocols so losing those would be genuinely annoying. I've also left out the methods for identifying ASCII letters and digits, since again, those are useful for interpreting various wire encodings. The case-related methods, though, have no place in sane wire protocol handling. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From steve at pearwood.info Wed Sep 7 03:02:05 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 07 Sep 2011 11:02:05 +1000 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> Message-ID: <4E66C28D.2010703@pearwood.info> Raymond Hettinger wrote: > On Sep 6, 2011, at 1:36 PM, Martin v. L?wis wrote: > >> I think this is what people underestimate. I can't name >> applications either - but that doesn't mean they don't exist. > > Google code search is pretty good indicator that this method > has near zero uptake. If it dies, I don't think anyone will cry. Near-zero is not zero, and Terry has already shown some examples of code which use, or misuse, swapcase. In any case (pun intended *wink*) this was discussed in December and Guido expressed little enthusiasm for the idea: http://mail.python.org/pipermail/python-dev/2010-December/106650.html I can't exactly defend the existence of swapcase, it does seem to be a fairly specialised function. But given that it exists, I'm -0.5 on removal on the basis of "if it ain't broke, don't fix it". -- Steven From solipsis at pitrou.net Wed Sep 7 03:07:58 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 7 Sep 2011 03:07:58 +0200 Subject: [Python-Dev] Maintenance burden of str.swapcase References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> Message-ID: <20110907030758.58caa4ed@pitrou.net> On Wed, 7 Sep 2011 10:47:16 +1000 Nick Coghlan wrote: > > However, a big +1 for deprecation in the case of bytes and bytearray. > That's nothing to do with the maintenance burden though, it's to do > with the semantic confusion between binary data and ASCII-encoded text > implied by the retention of methods like upper(), lower() and > swapcase(). A big -1 on that. Bytes objects are often used for partly ASCII strings, not arbitrary "arrays of bytes". And making indexing of bytes objects return ints was IMHO a mistake. Besides, if you want an array of ints, there's already array.array() with your typecode of choice. Not sure why other types should conform. Regards Antoine. From stephen at xemacs.org Wed Sep 7 03:53:27 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 07 Sep 2011 10:53:27 +0900 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> Message-ID: <877h5li0dk.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > However, a big +1 for deprecation in the case of bytes and bytearray. > That's nothing to do with the maintenance burden though, it's to do > with the semantic confusion between binary data and ASCII-encoded text > implied by the retention of methods like upper(), lower() and > swapcase(). [...] > These are all text operations, not something you do with binary data. "Yea, Brother, Amen!" I like the taste of this Kool-Aid. But.... > The case-related methods, though, have no place in sane wire > protocol handling. RFC 822 headers are a somewhat insane but venerable (isn't that true of anything that's reached age 350 in dog-years?), and venerated, counterexample. Specifically, field names are case-insensitive (RFC 5322, section 1.2.2). I'll bet you can find plenty of others if you look. You can call that "text" and say it should be processed in Unicode, if you like, but you're not even going to convince me (and as I say, I like the Kool-Aid). Specifically, SMTP processes can (and even MUST, under some circumstances IIRC) manipulate the RFC 822 header. Sorry, Nick, no can do. -1 From stephen at xemacs.org Wed Sep 7 04:15:04 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 07 Sep 2011 11:15:04 +0900 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <20110907030758.58caa4ed@pitrou.net> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> <20110907030758.58caa4ed@pitrou.net> Message-ID: <8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > Bytes objects are often used for partly ASCII strings, All I can say to that phrase is, "urk, ISO 2022 anyone?" > not arbitrary "arrays of bytes". And making indexing of bytes > objects return ints was IMHO a mistake. Bytes objects are not ASCII strings, even though they can be used to represent them. The practice of using magic numbers that look like English words is a useful one, but by the same token, it should not be too easy to use bytes to represent *text* just because the programmer doesn't know any words that don't fit into 7*N bits. With PEP 393, there isn't even really a space excuse. AFAICS, anything that should be done with ASCII-punned magic numbers ("protocol tokens", if you prefer) can be done with slices and (ta-da!) case conversion. (Sorry, Nick!) But the components of a bytes object are just numbers; they are not characters until you've run them through a codec. From ncoghlan at gmail.com Wed Sep 7 05:00:16 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 7 Sep 2011 13:00:16 +1000 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <877h5li0dk.fsf@uwakimon.sk.tsukuba.ac.jp> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> <877h5li0dk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Sep 7, 2011 at 11:53 AM, Stephen J. Turnbull wrote: > Nick Coghlan writes: > ?> The case-related methods, though, have no place in sane wire > ?> protocol handling. > > RFC 822 headers are a somewhat insane but venerable (isn't that true > of anything that's reached age 350 in dog-years?), and venerated, > counterexample. ?Specifically, field names are case-insensitive (RFC > 5322, section 1.2.2). ?I'll bet you can find plenty of others if you > look. ?You can call that "text" and say it should be processed in > Unicode, if you like, but you're not even going to convince me (and as > I say, I like the Kool-Aid). ?Specifically, SMTP processes can (and > even MUST, under some circumstances IIRC) manipulate the RFC 822 header. > > Sorry, Nick, no can do. > > -1 Heh, I knew as soon as I sent that message that someone would be able to point out a counter example. I agree that RFC 822 (and case-insensitive ASCII comparison in general) is enough to save lower() and upper() and co, but what about this even further reduced list of text-specific methods: 'capitalize' 'istitle' 'swapcase' 'title' While case-insensitive comparison makes sense for wire level data, where do these methods fit in, even when embedded ASCII text fragments are involved? Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Wed Sep 7 06:36:26 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 07 Sep 2011 13:36:26 +0900 Subject: [Python-Dev] Deprecating bytes.swapcase and friends [was: Maintenance burden of str.swapcase] In-Reply-To: References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> <877h5li0dk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <874o0phstx.fsf@uwakimon.sk.tsukuba.ac.jp> This is all speculation and no hint of implementation at this point ... redirecting this subthread to Python-Ideas. Reply-To set accordingly. Nick Coghlan writes: > Heh, I knew as soon as I sent that message that someone would be able > to point out a counter example. I agree that RFC 822 (and > case-insensitive ASCII comparison in general) is enough to save > lower() and upper() and co, but what about this even further reduced > list of text-specific methods: > > 'capitalize' > 'istitle' > 'swapcase' > 'title' > > While case-insensitive comparison makes sense for wire level data, > where do these methods fit in, even when embedded ASCII text fragments > are involved? Well, 'capitalize' could theoretically be used to "beautify" RFC 822 field names, but realistically, to me they're a litmus test for packages I probably don't want on my system.<0.5 wink> I don't know if it's worth the effort to deprecate them, though. There is a school of thought (represented on python-dev by Philip Eby and Antoine Pitrou, among others, I would say) that says that text with an implicit encoding is still text if you can figure out what the encoding is, and the syntactically important tokens are invariably ASCII, which often is enough information to do the work. So if you can do some operation without first converting to str, let's save the cycles and the bytes (especially in bit-shoveling applications like WSGI)! I disagree, but "consenting adults" and all that. It occurs to me that the bit-shoveling applications would generally be sufficiently well-served with a special "codec" that just stuffs the data pointer in a bytes object into the latin1 member of the data pointer union in a PEP 393 Unicode object, and marks the Unicode object as "ascii-compatible", ie, anything ASCII can be manipulated as text, but anything non-ASCII is like a private character that Python doesn't know anything about, and can't do anything useful with, except delete or pass through verbatim (perhaps as a slice). This may be nonsense; I don't know enough about Python internals to be sure. And it would be a change to PEP 393, since the encoding of the 8-bit representation would no longer be Unicode. I wouldn't blame Martin one bit if he hated the idea in principle! On the other hand, the "Latin-1 can be used to decode any binary content" end-around makes that point moot IMO. This would give a somewhat safer way of doing that. But if feasible and a Pythonic implementation could be devised, that would take much of the wind out of the sails of the "implicitly it's ASCII text" crowd. The whole "it's inefficient in time and space to work with 'str'" argument goes away, leaving them with "it's verbose" as the only reason for not doing the conversion. I don't know if there would be any use case left for bytes at that point ... but that's clearly a py4k discussion. From jcea at jcea.es Wed Sep 7 13:38:23 2011 From: jcea at jcea.es (Jesus Cea) Date: Wed, 07 Sep 2011 13:38:23 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es> <4E65AD58.6050106@jcea.es> Message-ID: <4E6757AF.4050007@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/09/11 07:27, Nick Coghlan wrote: > It may be the case that with the reduced memory limit, your > machine may not be able to run concurrent slaves for 2.7, 3.2 and > 3.x as I believe it does now. Antoine has changed the buildmaster configuration to only send me a build simultaneously. It doesn't solve the issue. I don't have enough resources even for a single build. I just send this email to the owner of the machine: """ XXXXXXX, I know you are very busy, but I would like to request formally the removal of the SWAP capping for my zone. After investigating the issue, I learn this: 1. Python "make test" launch a python process that can consume >300MB of RAM. 2. Under Solaris, a 300MB process doing a "fork()" will consume 600MB. That is, Solaris reserves this much memory just in case the processes modify their memory (to avoid "out of memory" condition simply because a process write to its own memory space). 3. So, if a 300MB is forked 10 times, it is going to "virtually" use 3GB. The real memory used is actually far less in the buildbot case, because the forked process doesn't modify their own memory so much (forked processes use Copy On Write). 4. So, the required memory to run the buildbots is actually "modest" compared with the "virtual" memory used. 5. A 4GB SWAP is not enough to run a single buildbot instance. I can have up to 6 instances, but 4GB is not enough for 1. Python-devs have modify the buildbot master for only sending me up to two build simultaneously, trying to help. It is not helping because 4GB of swap is not enough even for a single instance. 6. With an uncapped SWAP, the actual swapping would be quite low, because the swap is used to ensure memory reservation for the forked processes in the worst case (that the forked processes mess with their own copy of the 300MB address space, COW (Copy On Write)). In practice 4GB of RAM and uncapped SWAP would be enough, with no (or little) actual swapping. For this reasons I formally request a reconfiguration of my zone to uncap my SWAP usage. The proof is actually very simple: """ import time, os a="a"*1024*1024*512 os.fork() # 2 processes os.fork() # 4 processes os.fork() # 16 processes time.sleep(10) """ Running the previous program does this to my swap: (Solaris 10 Update 9) """ [root at buffy /]# swap -s total: 684704k bytes allocated + 3732892k reserved = 4417596k used, 31829688k available """ After the programs die, I have this: """ [root at buffy /]# swap -s total: 156680k bytes allocated + 43284k reserved = 199964k used, 36118796k available """ In this machine, I have 4GB of RAM, 32GB of swap. So, this trivial test requires >4GB of RAM+SWAP even if it is actually using only ~512MB of RAM. Solaris is (rightly) playing safe being sure the program can actually play/modify its memory space. XXXXX, if you can't/don't want to modify my zone configuration, let me know, so I can think what to do next. If I have to talk to somebody else, please let me know. Sorry for bother your with these details. I really appreciate the effort you and your team are doing with OpenIndiana in general and supporting the Python buildbots under OI in particular. I hope we can solve this situation. Thanks for your time and effort. PS: I think that such memory+swap requirements are quite high, anyway, and I will pursuit it. But in the meantime I need the buildbot online, as it was a couple of weeks ago :-) Thanks!. """ So, the problem is that a) "make test" takes quite a bit of RAM and b) the buildbot forks some "big" processes, so the virtual memory needed is BIG. Linux is known for "overcommiting" memory. That is, playing fast and risky not actually reserving memory, hoping the process will not actually use it or it will do an "exec" inmediatelly, so this problem can be not apparent under Linux, but it is there. So I have two questions: 1. Can we reduce the memory footprint of the tests?. I can't understand why the python test process is taking so much memory. 2. Why buildbot is "forking()" big processes?. Can we do something to change this?. I will wait a few days for OpenIndiana team to reply. If the result is not satisfactory, I will try to setup a VirtualMachine with the required resources myself. Crossing fingers... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmdXr5lgi5GaxT1NAQKmRwP/dyg4qEs+oWt4r365D797+ItbHluuEVJ+ mWTZw5HVeDajrN7faGH6WuA/J+dJuBp2H4rB8WIM1U/DytL7aZDdDHCeXS79IlUw SEb5kMA4ENSB6N6bhKmOWpKlwtMQWmw/CtB6//ZX29UZD6ys3UsbO8KslT+M/1EG P2zmn3PSzo8= =WE+9 -----END PGP SIGNATURE----- From solipsis at pitrou.net Wed Sep 7 14:32:59 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 7 Sep 2011 14:32:59 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es> <4E65AD58.6050106@jcea.es> <4E6757AF.4050007@jcea.es> Message-ID: <20110907143259.2bcff454@pitrou.net> On Wed, 07 Sep 2011 13:38:23 +0200 Jesus Cea wrote: > > So, the problem is that a) "make test" takes quite a bit of RAM and b) > the buildbot forks some "big" processes, so the virtual memory needed > is BIG. Note that buildbots run "make buildbottest", not "make test". > So I have two questions: > > 1. Can we reduce the memory footprint of the tests?. I can't > understand why the python test process is taking so much memory. Because the test suite will by construction load all the stdlib (minus the few modules which don't have a test suite), and creates numerous test scenarios. Depending on the memory allocator, fragmentation can make it difficult to reclaim memory that has been formally freed after a test is run. If "-j" is used, tests get run in a separate process each, so that approach might be an answer. > 2. Why buildbot is "forking()" big processes?. Can we do something to > change this?. Because we need to test for various functionalities, such as os.fork() and os.exec*(), but also the command-line behaviour of the interpreter, the distutils module, the packaging module, the subprocess module, the multiprocessing module... (this list is not exhaustive). Regards Antoine. From solipsis at pitrou.net Wed Sep 7 14:47:49 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 7 Sep 2011 14:47:49 +0200 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> <20110907030758.58caa4ed@pitrou.net> <8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20110907144749.7c1a9d50@pitrou.net> On Wed, 07 Sep 2011 11:15:04 +0900 "Stephen J. Turnbull" wrote: > Antoine Pitrou writes: > > > Bytes objects are often used for partly ASCII strings, > > All I can say to that phrase is, "urk, ISO 2022 anyone?" You could also point out UTF-16 or EBCDIC, but I fail to see how that's relevant. Do you have problems with ISO 2022 when parsing, say, e-mail headers? > > not arbitrary "arrays of bytes". And making indexing of bytes > > objects return ints was IMHO a mistake. > > Bytes objects are not ASCII strings, even though they can be used to > represent them. I'm talking about practice, not some idealistic view of the world. In many use cases (XML, HTML, e-mail headers, many other test-based protocols), you can get a mixture of ASCII "commands", and opaque binary stuff (which will or will not, depending on these "commands", have a meaningful unicode decoding). In the stdlib, bytes objects are accessed far more often to poke at some text-like data, than to poke at arbitrary numbers. > With PEP 393, > there isn't even really a space excuse. Of course there is. Any single non-ASCII byte of data mingled with aforementioned ASCII "commands" will make it switch to a less efficient representation. And "surrogateescape" will be a performance problem in itself, when used on large binary data; if you use "latin1" instead, you are risking far greater confusion; ask David about that dilemma. :-) > AFAICS, anything that should be done with ASCII-punned magic numbers > ("protocol tokens", if you prefer) can be done with slices and (ta-da!) > case conversion. So, basically, you're saying that we should remove useful functionality and tell people to reimplement an adhoc version of it when they need it. That sounds obnoxious. Regards Antoine. From hodgestar+pythondev at gmail.com Wed Sep 7 18:31:08 2011 From: hodgestar+pythondev at gmail.com (Simon Cross) Date: Wed, 7 Sep 2011 18:31:08 +0200 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <4E66845F.3060708@v.loewis.de> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> Message-ID: On Tue, Sep 6, 2011 at 10:36 PM, "Martin v. L?wis" wrote: >> Which applications? I'm not sure the number of applications using >> str.swapcase gets even as high as ten. > > I think this is what people underestimate. I can't name > applications either - but that doesn't mean they don't exist. > I'm deeply convinced that the majority of Python code (and > I mean *large* majority) is unpublished. > > I expect thousands of uses world-wide. http://www.google.com/codesearch#search/&q=swapcase%20lang:%5Epython$&type=cs There are quite a few hits but more people appear to be re-implementing it than using it (I haven't gone to the trouble of mining the search results to get an accurate picture though). From hodgestar+pythondev at gmail.com Wed Sep 7 18:33:46 2011 From: hodgestar+pythondev at gmail.com (Simon Cross) Date: Wed, 7 Sep 2011 18:33:46 +0200 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> Message-ID: On Wed, Sep 7, 2011 at 6:31 PM, Simon Cross wrote: > http://www.google.com/codesearch#search/&q=swapcase%20lang:%5Epython$&type=cs > > There are quite a few hits but more people appear to be > re-implementing it than using it (I haven't gone to the trouble of > mining the search results to get an accurate picture though). Scratch that -- I should gloss over search results less. It looks like the most common use case is to provide a consistent string-like API somewhere else. So removing it is liking to cause headaches (e.g. test failures) for the people who are wrapping it. From stephen at xemacs.org Wed Sep 7 19:26:00 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 08 Sep 2011 02:26:00 +0900 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <20110907144749.7c1a9d50@pitrou.net> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> <20110907030758.58caa4ed@pitrou.net> <8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp> <20110907144749.7c1a9d50@pitrou.net> Message-ID: <87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > You could also point out UTF-16 or EBCDIC, but I fail to see how that's > relevant. Do you have problems with ISO 2022 when parsing, say, e-mail > headers? Yes, of course! Especially when it's say, packed EUC not encapsulated in MIME words. I think Mailman now handles that without crashing, but it took 10 years. Most Emacs MUAs still blow chunks on that. My procmail recipes and my employer's virus checker both occasionally punt. The point about ISO 2022 is that it allows arbitrary binary crap in the stream, delimited by appropriate well-defined constructs. Just like the ASCII-like tokens in the protocols you talk about. But parsing full-bore ISO 2022 is non-trivial, especially if you're going to try to provide error-handling that's useful to the user. Nobody ever really took it seriously as a solution to the problem of internationalization in the 15 years or so when it was the only solution, and even less so once it became clear that UCSes were going to get traction. > > > not arbitrary "arrays of bytes". And making indexing of bytes > > > objects return ints was IMHO a mistake. > > > > Bytes objects are not ASCII strings, even though they can be used to > > represent them. > > I'm talking about practice, So am I, and so is Nick. > not some idealistic view of the world. > In many use cases (XML, HTML, e-mail headers, many other test-based > protocols), you can get a mixture of ASCII "commands", and opaque > binary stuff (which will or will not, depending on these "commands", > have a meaningful unicode decoding). Yeah, so what? Those protocol tokens are deliberately chosen to resemble ASCII text, but you need to parse them out of the binary sludge somehow, and the surrounding content remains binary sludge until deserialized or (for text) decoded. How is having b[0] return a bytes object, rather than an integer, going to help in that? Especially if the value is not in the ASCII range? > > AFAICS, anything that should be done with ASCII-punned magic numbers > > ("protocol tokens", if you prefer) can be done with slices and (ta-da!) > > case conversion. > > So, basically, you're saying that we should remove useful functionality No, that *was* Nick's position; I specifically opposed the suggestion that "lower" and "upper" be removed, and he concurred after a bit of thought. And remember, he's talking about removing "swapcase". Which RFC defines a protocol where that would be useful? How about "title"? > and tell people to reimplement an adhoc version of it when they > need it. Of course not; I'm with Michael Foord on that: nobody should ever be asked to reimplement swapcase! My position is simply that bytes are not text, and the occasional reminder (such as b[0] returning an integer, not a bytes object) is good. My experience has been that it makes a lot of sense to layer these things, for example transforming a protocol stream serialized as octets into a more structured object composed of protocol tokens and payloads. It's *not* text, and the relevant techniques are different. It's like the old saw about "aha, I'll use regexps to solve this problem!" and now you have *two* problems. I don't advocate getting rid of regexps, and I don't advocate removing methods from bytes (although I do dream about it occasionally). I do advocate that people think twice before implementing complex text-like algorithms on binary protocol streams. If the stream really is text-like, then transform it into text of a known, well-behaved encoding, and then apply the powerful text-processing facilities provided for str. If it's not, then transform to a token stream or whatever makes sense. In both cases, do as little "text processing" on bytes objects as possible, and put more structure on the content as soon as possible. If you really need the efficiency, then do what you need to do. As I say, I don't have any practical objection to keeping your tools for that case. But such applications, although important (I guess), are a minority. > That sounds obnoxious. Good advice almost always sounds obnoxious to the recipient. From glyph at twistedmatrix.com Wed Sep 7 19:51:50 2011 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Wed, 7 Sep 2011 10:51:50 -0700 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> <20110907030758.58caa4ed@pitrou.net> <8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp> <20110907144749.7c1a9d50@pitrou.net> <87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com> On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote: > How about "title"? >>> 'content-length'.title() 'Content-Length' You might say that the protocol "has" to be case-insensitive so this is a silly frill: there are definitely enough case-sensitive crappy bits of network middleware out there that this function is critically important for an HTTP server. In general I'd like to defend keeping as many of these methods as possible for compatibility (porting to Py3 is already hard enough). Although even I might have a hard time defending 'swapcase', which is never used _at all_ within Twisted, on text or bytes. The only use-case I can think of for that method is goofy joke text filters, and it wouldn't be very good at that either. -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 8 00:29:33 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 8 Sep 2011 08:29:33 +1000 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> <20110907030758.58caa4ed@pitrou.net> <8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp> <20110907144749.7c1a9d50@pitrou.net> <87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp> <4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com> Message-ID: On Thu, Sep 8, 2011 at 3:51 AM, Glyph Lefkowitz wrote: > On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote: > > How about "title"? > >>>> 'content-length'.title() > 'Content-Length' > You might say that the protocol "has" to be case-insensitive so this is a > silly frill: there are definitely?enough case-sensitive crappy bits of > network middleware out there that this function is critically important for > an HTTP server. Actually, the HTTP header case occurred to me as well shortly after sending my last message, so I think it's a legitimate reason to keep the methods around on bytes and bytearray. So, putting my "practicality beats purity" hat back on, I would describe the status quo as follows: 1. Binary data is not text, so bytes and bytearray are deliberately conceptualised as arrays of arbitrary integers in the range 0-255 rather than as arrays of 8-bit 'characters'. This distinction is one of the core design principles separating Python 3 from Python 2. 2. However, the use of ASCII words and characters is a common feature of many existing wire protocols, so it is useful to be able to manipulate binary sequences that contain data in an ASCII-compatible format without having to convert them to text first. Retaining additional ASCII-based methods also eases the transition to Python 3 for code that manipulates binary data using the 2.x str type. 3. ASCII whitespace characters are used as delimeters in many formats. Thus, various methods such as split(), partition(), strip() and their variants, retain their "ASCII whitespace" default arguments and expandtabs() is also retained. 4. Padding values out to fill fields of a certain size is needed for some formats. Thus, center(), ljust(), rjust(), zfill() are retained (again retaining their ASCII space default fill character in the case of the first 3 methods) 5. Identifying ASCII alphanumeric data is important for some formats. Thus, isalnum(), isalpha() and isdigit() are retained. 6. Case insensitive ASCII comparisons are important for some formats (e.g. RFC 822 headers, HTTP headers). Thus, upper(), lower(), isupper() and islower() are retained. 7. Even correct mixed case ASCII can be important for some formats (e.g. HTTP headers). Thus, capitalize(), title() and istitle() are retained. 8. A valid use for swapcase() on binary data has not been identified, but once all the other ASCII based methods are being kept around for the various reasons given above, it doesn't seem worth the effort to get rid of this one (despite the additional implementation effort needed for alternate implementations). 9. Algorithms that operate purely on binary data or purely on text can just use literals of the appropriate type (if they use literals at all). Algorithms that are designed to operate on either kind of data may want to adopt an implicit decode/encode approach to handle binary inputs (this allows assumptions regarding the input encoding to be made explicit). I'm actually fairly happy with that rationalisation for the current Python 3 set up. I'd been thinking recently that we would have been better off if more of the methods that rely on the data using an ASCII compatible encoding scheme had been removed from bytes and bytearray, but swapcase() is really the only one we can't give a decent justification for beyond "it was there in 2.x". Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From jcea at jcea.es Thu Sep 8 03:12:33 2011 From: jcea at jcea.es (Jesus Cea) Date: Thu, 08 Sep 2011 03:12:33 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: <20110907143259.2bcff454@pitrou.net> References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es> <4E65AD58.6050106@jcea.es> <4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net> Message-ID: <4E681681.6060405@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 07/09/11 14:32, Antoine Pitrou wrote: > If "-j" is used, tests get run in a separate process each, so that > approach might be an answer. Antoine, I think this would be the answer. Each test would be a bit slower, because I would launch a new python process per test, but I could run 16 tests in parallel (I have 16 CPUs and, actually, most tests are not CPU intensive). I sorry to bother you with these details and waste of time, but could you possibly change my buildbot configurarion to launch, let's say, 4 test processes in parallel, just for testing?. Another option would be to have a single Python process and "fork" for each test. That would launch each test in a separate process without requiring a full python interpreter launching each time. Is this the way "-j" is implemented, or is "-j" something external, like "make -j"?. BTW, the (nice and helpful) OpenIndiana folks have told me a few hours ago that they would increase my swap limit to 16GB. I am now waiting for this change to be done. I want my six builds in parallel (2.7, 3.2, 3.x, in 32 and 64 bits) back!. Sorry for wasting your time with these mundane details... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmgWgZlgi5GaxT1NAQI/eAP/anenlTjt7NxIzMLK+ME+f84zLurb8MS/ XiLRpVSNDn6TzKnqXtDLfOc6sua81h+ZlpHvuFNHOkK9u/PkmeUKidgoDvASj5Ti ITUmUxigX1j9ZbD1ITkn53msm1xfug3rw/8+Rh//4ONhhbmhSm8ChZ0iNwtntToG 5SwL3BL2iSI= =fCJe -----END PGP SIGNATURE----- From meadori at gmail.com Thu Sep 8 04:06:29 2011 From: meadori at gmail.com (Meador Inge) Date: Wed, 7 Sep 2011 21:06:29 -0500 Subject: [Python-Dev] python -m tokenize in 3.x ? Message-ID: Hi All, I have been investing some 'tokenize' bugs recently. As a part of that investigation I was trying to use '-m tokenize', which works great in 2.x: [meadori at motherbrain cpython]$ python2.7 -m tokenize test.py 1,0-1,5: NAME 'print' 1,6-1,21: STRING '"Hello, World!"' 1,21-1,22: NEWLINE '\n' 2,0-2,0: ENDMARKER '' In 3.x, however, the functionality has been removed and replaced with some hard-wired test code: [meadori at motherbrain cpython]$ python3 -m tokenize test.py TokenInfo(type=57 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='') TokenInfo(type=1 (NAME), string='def', start=(1, 0), end=(1, 3), line='def parseline(self, line):') TokenInfo(type=1 (NAME), string='parseline', start=(1, 4), end=(1, 13), line='def parseline(self, line):') TokenInfo(type=53 (OP), string='(', start=(1, 13), end=(1, 14), line='def parseline(self, line):') ... Why is this? I found the commit where the functionality was removed [1], but no explanation. Any objection to adding this feature back? [1] http://hg.python.org/cpython/rev/51e24512e305/ -- # Meador From stephen at xemacs.org Thu Sep 8 04:46:42 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 08 Sep 2011 11:46:42 +0900 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> <20110907030758.58caa4ed@pitrou.net> <8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp> <20110907144749.7c1a9d50@pitrou.net> <87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp> <4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com> Message-ID: <87pqjbhht9.fsf@uwakimon.sk.tsukuba.ac.jp> Glyph Lefkowitz writes: > On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote: > > > How about "title"? > > >>> 'content-length'.title() > 'Content-Length' > > You might say that the protocol "has" to be case-insensitive so > this is a silly frill: Not me, sir. My whole point about the "bytes should be more like str" controversy is the dual of that: you don't know what will be coming at you, so the regularities and (normally allowable) fuzziness of text processing are inadmissible. > there are definitely enough case-sensitive crappy bits of network > middleware out there that this function is critically important for > an HTTP server. "Critically important" is surely an overstatement. You could always title-case the literal strings containing field names in the source. The problem with having lots of str-like features on bytes is that you lose TOOWDTI, or worse, to many performance-happy coders, use of bytes becomes TOOWDTI "because none of the characters[sic] I'm planning to process myself are non-ASCII". This is the road to Babel; it's workable for one-off scripts but it's asking for long-term trouble in multi-module applications. The choice of decoding to str and processing in that form should be made as attractive as possible. On the other hand, it is undeniably useful for protocol tokens to have mnemonic representations even in binary protocols. Textual manipulations on those tokens should be convenient. It seems to me that what might be an improvement over the current situation (maybe for Py4k only, though) is for bytes and (PEP-393-style) str to share representation, and have a "cast" method which would convert from one to the other, validating that the range contraints on the representation are satisfied. The problem I see is that this either sanctions the practice of using latin-1 as "ASCII plus anything", which is an unpleasant hack, or you'd need to check in text methods that nothing is done with non-ASCII values other than checks for set membership (including equality comparison, of course). OTOH, AFAICS, Antoine's claim that inserting a non-latin-1 character in a str that happens to contain only ASCII values would convert the representation to multioctets (true), and therefore this doesn't give the desired efficiency properties, is beside the point. Just don't do that! You *can't* do that in a bytes object, anyway; use of str in this way is a "consenting adults" issue. You trade off the convenience of the full suite of text tools vs. the possibility that somebody might insert such a character -- but for the algorithms they're going to be using, they shouldn't be doing that anyway. From guido at python.org Thu Sep 8 05:48:19 2011 From: guido at python.org (Guido van Rossum) Date: Wed, 7 Sep 2011 20:48:19 -0700 Subject: [Python-Dev] python -m tokenize in 3.x ? In-Reply-To: References: Message-ID: My guess that there was no specific intent -- most likely it occurred to nobody that the main() functionality was actually useful. I'd say it's fine to put it back, and then document it (so it won't be removed again :-). --Guido On Wed, Sep 7, 2011 at 7:06 PM, Meador Inge wrote: > Hi All, > > I have been investing some 'tokenize' bugs recently. ?As a part of > that investigation I was trying to use '-m tokenize', which works > great in 2.x: > > [meadori at motherbrain cpython]$ python2.7 -m tokenize test.py > 1,0-1,5: ? ? ? ?NAME ? ?'print' > 1,6-1,21: ? ? ? STRING ?'"Hello, World!"' > 1,21-1,22: ? ? ?NEWLINE '\n' > 2,0-2,0: ? ? ? ?ENDMARKER ? ? ? '' > > In 3.x, however, the functionality has been removed and replaced with > some hard-wired test code: > > [meadori at motherbrain cpython]$ python3 -m tokenize test.py > TokenInfo(type=57 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='') > TokenInfo(type=1 (NAME), string='def', start=(1, 0), end=(1, 3), > line='def parseline(self, line):') > TokenInfo(type=1 (NAME), string='parseline', start=(1, 4), end=(1, > 13), line='def parseline(self, line):') > TokenInfo(type=53 (OP), string='(', start=(1, 13), end=(1, 14), > line='def parseline(self, line):') > ... > > Why is this? ?I found the commit where the functionality was removed > [1], but no explanation. ?Any objection to adding this feature back? > > [1] http://hg.python.org/cpython/rev/51e24512e305/ > > -- > # Meador > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Thu Sep 8 09:18:05 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 8 Sep 2011 09:18:05 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es> <4E65AD58.6050106@jcea.es> <4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net> <4E681681.6060405@jcea.es> Message-ID: <20110908091805.3f1e9141@pitrou.net> Hello Jesus, > I sorry to bother you with these details > and waste of time, but could you possibly change my buildbot > configurarion to launch, let's say, 4 test processes in parallel, just > for testing? Ok, I've added "-j4", let's how that works. > Another option would be to have a single Python process and "fork" for > each test. That would launch each test in a separate process without > requiring a full python interpreter launching each time. Is this the > way "-j" is implemented It uses subprocess actually, so fork() + exec() is used. > BTW, the (nice and helpful) OpenIndiana folks have told me a few hours > ago that they would increase my swap limit to 16GB. I am now waiting > for this change to be done. Good news :) Regards Antoine. From ezio.melotti at gmail.com Thu Sep 8 11:11:52 2011 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Thu, 8 Sep 2011 12:11:52 +0300 Subject: [Python-Dev] [Python-checkins] cpython: Issue #12567: Fix curses.unget_wch() tests In-Reply-To: References: Message-ID: Hi, On Tue, Sep 6, 2011 at 11:08 AM, victor.stinner wrote: > http://hg.python.org/cpython/rev/786668a4fb6b > changeset: 72301:786668a4fb6b > user: Victor Stinner > date: Tue Sep 06 10:08:28 2011 +0200 > summary: > Issue #12567: Fix curses.unget_wch() tests > > Skip the test if the function is missing. Use U+0061 (a) instead of U+00E9 > (?) > because U+00E9 raises a _curses.error('unget_wch() returned ERR') on some > buildbots. It's maybe because of the locale encoding. > > files: > Lib/test/test_curses.py | 6 ++++-- > 1 files changed, 4 insertions(+), 2 deletions(-) > > > diff --git a/Lib/test/test_curses.py b/Lib/test/test_curses.py > --- a/Lib/test/test_curses.py > +++ b/Lib/test/test_curses.py > @@ -265,14 +265,16 @@ > stdscr.getkey() > > def test_unget_wch(stdscr): > - ch = '\xe9' > + if not hasattr(curses, 'unget_wch'): > + return > This should be a skip, not a bare return. > + ch = 'a' > curses.unget_wch(ch) > read = stdscr.get_wch() > read = chr(read) > if read != ch: > raise AssertionError("%r != %r" % (read, ch)) > Why not just assertEqual? > > - ch = ord('\xe9') > + ch = ord('a') > curses.unget_wch(ch) > read = stdscr.get_wch() > if read != ch: > > > Best Regards, Ezio Melotti -------------- next part -------------- An HTML attachment was scrubbed... URL: From fwierzbicki at gmail.com Fri Sep 9 00:09:09 2011 From: fwierzbicki at gmail.com (fwierzbicki at gmail.com) Date: Thu, 8 Sep 2011 15:09:09 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com> <201108262337.42349.victor.stinner@haypocalc.com> Message-ID: On Fri, Aug 26, 2011 at 3:00 PM, Guido van Rossum wrote: > I have a different question about IronPython and Jython now. Do their > regular expression libraries support Unicode better than CPython's? > E.g. does "." match a surrogate pair? Tom C suggests that Java's regex > libraries get this and many other details right despite Java's use of > UTF-16 to represent strings. So hopefully Jython's re library is built > on top of Java's? > > PS. Is there a better contact for Jython? The best contact for Unicode and Jython is Jim Baker (I added him to the cc) - I'll do my best to answer though: Java 5 added a bunch of methods for dealing with Unicode that doesn't fit into 2 bytes - and looking at our code for our Unicode object, I see that we are using methods like the codePointCount method off of java.lang.String to compute length[1] and using similar methods all through that code to make sure we deal in code points when dealing with unicode. So it looks pretty good for us as far as I can tell. [1] http://download.oracle.com/javase/6/docs/api/java/lang/String.html#codePointCount(int, int) -Frank Wierzbicki From fwierzbicki at gmail.com Fri Sep 9 00:15:46 2011 From: fwierzbicki at gmail.com (fwierzbicki at gmail.com) Date: Thu, 8 Sep 2011 15:15:46 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com> <201108262337.42349.victor.stinner@haypocalc.com> Message-ID: Oops, forgot to add the link for the gory details for Java and > 2 byte unicode: http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ From fwierzbicki at gmail.com Fri Sep 9 00:50:45 2011 From: fwierzbicki at gmail.com (fwierzbicki at gmail.com) Date: Thu, 8 Sep 2011 15:50:45 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com> <201108262337.42349.victor.stinner@haypocalc.com> Message-ID: On Fri, Aug 26, 2011 at 3:00 PM, Guido van Rossum wrote: > I have a different question about IronPython and Jython now. Do their > regular expression libraries support Unicode better than CPython's? > E.g. does "." match a surrogate pair? Tom C suggests that Java's regex > libraries get this and many other details right despite Java's use of > UTF-16 to represent strings. So hopefully Jython's re library is built > on top of Java's? Even bigger oops - I answered the thread questions and not this specific one. Currently Jython's re is a Jython specific implementation and so is not likely to benefit from the improvements in Java's re implementation. I think in terms of PEP 393 this should probably be considered a bug that we need to fix... -Frank Wierzbicki From tjreedy at udel.edu Fri Sep 9 07:39:21 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 09 Sep 2011 01:39:21 -0400 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com> <201108262337.42349.victor.stinner@haypocalc.com> Message-ID: On 9/8/2011 6:15 PM, fwierzbicki at gmail.com wrote: > Oops, forgot to add the link for the gory details for Java and> 2 byte unicode: > > http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ This is dated 2004. Basically, they considered several options, tried out 4, and ended up sticking with char[] (sequences) as UTF-16 with char = 16 bit code unit and added 32-bit Character(int) class for low-level manipulation of code points. I did not see the indexing problem mentioned. I get the impression that they encourage sequence forward-backward iteration (cursor-based access) rather than random-access indexing. -- Terry Jan Reedy From jcea at jcea.es Fri Sep 9 17:14:07 2011 From: jcea at jcea.es (Jesus Cea) Date: Fri, 09 Sep 2011 17:14:07 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: <20110908091805.3f1e9141@pitrou.net> References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es> <4E65AD58.6050106@jcea.es> <4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net> <4E681681.6060405@jcea.es> <20110908091805.3f1e9141@pitrou.net> Message-ID: <4E6A2D3F.3070906@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/09/11 09:18, Antoine Pitrou wrote: > Ok, I've added "-j4", let's how that works. It is not helping. it is taking tons of memory yet. >> Another option would be to have a single Python process and >> "fork" for each test. That would launch each test in a separate >> process without requiring a full python interpreter launching >> each time. Is this the way "-j" is implemented > > It uses subprocess actually, so fork() + exec() is used. Yes, does it but fork for each test or simply launch 4 processes, each doing 1/4 of the tests?. >> BTW, the (nice and helpful) OpenIndiana folks have told me a few >> hours ago that they would increase my swap limit to 16GB. I am >> now waiting for this change to be done. > > Good news :) 16GB of swap activated a few minutes ago. Thanks, Jon and Alastair :-) (OpenIndiana guys). Launching buildbots now and crossing fingers... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmotPplgi5GaxT1NAQKUzQP/Qm+lyCQeldL1XEkkq1EHY5C/hKvMDz9i qOV29iai/hkeqRWY2Fiu4vSfNTDAEil9eEIJQMGmUyYOMCrfOEoDCYzr+xTWfnNu EWzI6mEe8XWIUicGDAf/dbUEk11wtSrtXA09G0Q5oQWg0b6auQHYv5vhZITwDWSO h9rLBnZ0ZHI= =8Mpw -----END PGP SIGNATURE----- From jcea at jcea.es Fri Sep 9 17:14:18 2011 From: jcea at jcea.es (Jesus Cea) Date: Fri, 09 Sep 2011 17:14:18 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: <20110908091805.3f1e9141@pitrou.net> References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es> <4E65AD58.6050106@jcea.es> <4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net> <4E681681.6060405@jcea.es> <20110908091805.3f1e9141@pitrou.net> Message-ID: <4E6A2D4A.30503@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/09/11 09:18, Antoine Pitrou wrote: > Ok, I've added "-j4", let's how that works. It is not helping. it is taking tons of memory yet. >> Another option would be to have a single Python process and >> "fork" for each test. That would launch each test in a separate >> process without requiring a full python interpreter launching >> each time. Is this the way "-j" is implemented > > It uses subprocess actually, so fork() + exec() is used. Yes, does it but fork for each test or simply launch 4 processes, each doing 1/4 of the tests?. >> BTW, the (nice and helpful) OpenIndiana folks have told me a few >> hours ago that they would increase my swap limit to 16GB. I am >> now waiting for this change to be done. > > Good news :) 16GB of swap activated a few minutes ago. Thanks, Jon and Alastair :-) (OpenIndiana guys). Launching buildbots now and crossing fingers... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmotSplgi5GaxT1NAQIHEwQAhcKKUerwx++/egmYRO86z5XmgiWh/chz j3xNcMau7L2pxqymEUwfQKihXrYS58ocTiRBEyHAl3vMOouRwgS8joT2eQugfjux Cy+Rglw+4yg99n+eGwF0z4QxbEljuBJFIrR/+BKeN0sBdT/n1/PZIkN/cWMLDk8t bw1FtxfSW6s= =F0er -----END PGP SIGNATURE----- From status at bugs.python.org Fri Sep 9 18:07:27 2011 From: status at bugs.python.org (Python tracker) Date: Fri, 9 Sep 2011 18:07:27 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20110909160727.9E4081CA91@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-09-02 - 2011-09-09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3000 (+33) closed 21727 (+26) total 24727 (+59) Open issues with patches: 1287 Issues opened (49) ================== #12887: Documenting all SO_* constants in socket module http://bugs.python.org/issue12887 opened by sandro.tosi #12890: cgitb displays

tags when executed in text mode http://bugs.python.org/issue12890 opened by mcjeff #12891: Clean up traces of manifest template in packaging http://bugs.python.org/issue12891 opened by eric.araujo #12892: UTF-16 and UTF-32 codecs should reject (lone) surrogates http://bugs.python.org/issue12892 opened by ezio.melotti #12895: In MSI/EXE installer, allow installing Python modules in free http://bugs.python.org/issue12895 opened by cool-RR #12896: Recommended location of the interpreter for Python 3 http://bugs.python.org/issue12896 opened by lregebro #12897: Support for iterators in multiprocessing map http://bugs.python.org/issue12897 opened by acooke #12900: Use universal newlines mode for setup.cfg http://bugs.python.org/issue12900 opened by eric.araujo #12901: Nest class/methods directives in documentation http://bugs.python.org/issue12901 opened by eric.araujo #12902: help("modules") executes module code http://bugs.python.org/issue12902 opened by dronus #12903: test_io.test_interrupte[r]d* blocks on OpenBSD http://bugs.python.org/issue12903 opened by rpointel #12904: Change os.utime &c functions to use nanosecond precision where http://bugs.python.org/issue12904 opened by larry #12905: multiple errors in test_socket on OpenBSD http://bugs.python.org/issue12905 opened by rpointel #12907: Update test coverage devguide page http://bugs.python.org/issue12907 opened by brett.cannon #12908: Update dev-in-a-box for new coverage steps http://bugs.python.org/issue12908 opened by brett.cannon #12910: urrlib.quote quotes too many chars, e.g., '()' http://bugs.python.org/issue12910 opened by joern #12911: Expose a private accumulator C API http://bugs.python.org/issue12911 opened by pitrou #12912: xmlrpclib.__version__ not bumped with updates http://bugs.python.org/issue12912 opened by rcritten #12913: Add a debugging howto http://bugs.python.org/issue12913 opened by eric.araujo #12914: Add cram function to textwrap http://bugs.python.org/issue12914 opened by eric.araujo #12915: Add inspect.locate and inspect.resolve http://bugs.python.org/issue12915 opened by eric.araujo #12916: Add inspect.splitdoc http://bugs.python.org/issue12916 opened by eric.araujo #12917: Make visiblename and allmethods functions public http://bugs.python.org/issue12917 opened by eric.araujo #12918: New module for terminal utilities http://bugs.python.org/issue12918 opened by eric.araujo #12919: Control what module is imported first http://bugs.python.org/issue12919 opened by brett.cannon #12920: Inspect.getsource fails to get source of local classes http://bugs.python.org/issue12920 opened by Popa.Claudiu #12921: http.server.BaseHTTPRequestHandler.send_error and trailing new http://bugs.python.org/issue12921 opened by Paul.Upchurch #12922: StringIO and seek() http://bugs.python.org/issue12922 opened by terry.reedy #12923: test_urllib fails in refleak mode http://bugs.python.org/issue12923 opened by skrah #12924: Missing call to quote_plus() in test_urllib.test_default_quoti http://bugs.python.org/issue12924 opened by jon #12925: python setup.py upload_docs doesn't ask for login and password http://bugs.python.org/issue12925 opened by cancel #12926: tarfile tarinfo.extract*() broken with symlinks http://bugs.python.org/issue12926 opened by Fabio.Erculiani #12927: test_ctypes: segfault with suncc http://bugs.python.org/issue12927 opened by skrah #12930: reindent.py inserts spaces in multiline literals http://bugs.python.org/issue12930 opened by Dima.Tisnek #12931: xmlrpclib confuses unicode and string http://bugs.python.org/issue12931 opened by wosc #12932: dircmp does not allow non-shallow comparisons http://bugs.python.org/issue12932 opened by kesmit #12933: Update or remove claims that distutils requires external progr http://bugs.python.org/issue12933 opened by eric.araujo #12934: pysetup doesn???t work for the docutils project http://bugs.python.org/issue12934 opened by eric.araujo #12935: Typo in findertools.py http://bugs.python.org/issue12935 opened by karstenw #12936: armv5tejl: random segfaults in getaddrinfo() http://bugs.python.org/issue12936 opened by skrah #12937: Support install options as found in distutils http://bugs.python.org/issue12937 opened by brett.cannon #12938: html.escape docstring does not mention single quotes (') http://bugs.python.org/issue12938 opened by zvin #12939: Add new io.FileIO using the native Windows API http://bugs.python.org/issue12939 opened by haypo #12940: Cmd example using turtle left vs. right doc-bug http://bugs.python.org/issue12940 opened by Gumnos #12941: add random.pop() http://bugs.python.org/issue12941 opened by jfeuerstein #12942: Shebang line fixer for 2to3 http://bugs.python.org/issue12942 opened by Aaron.Meurer #12943: tokenize: add python -m tokenize support back http://bugs.python.org/issue12943 opened by meadori #12944: setup.py upload to pypi needs to work with specified files http://bugs.python.org/issue12944 opened by illume #12945: ctypes works incorrectly with _swappedbytes_ = 1 http://bugs.python.org/issue12945 opened by Pavel.Boldin Most recent 15 issues with no replies (15) ========================================== #12945: ctypes works incorrectly with _swappedbytes_ = 1 http://bugs.python.org/issue12945 #12944: setup.py upload to pypi needs to work with specified files http://bugs.python.org/issue12944 #12943: tokenize: add python -m tokenize support back http://bugs.python.org/issue12943 #12942: Shebang line fixer for 2to3 http://bugs.python.org/issue12942 #12937: Support install options as found in distutils http://bugs.python.org/issue12937 #12936: armv5tejl: random segfaults in getaddrinfo() http://bugs.python.org/issue12936 #12935: Typo in findertools.py http://bugs.python.org/issue12935 #12934: pysetup doesn???t work for the docutils project http://bugs.python.org/issue12934 #12933: Update or remove claims that distutils requires external progr http://bugs.python.org/issue12933 #12932: dircmp does not allow non-shallow comparisons http://bugs.python.org/issue12932 #12926: tarfile tarinfo.extract*() broken with symlinks http://bugs.python.org/issue12926 #12924: Missing call to quote_plus() in test_urllib.test_default_quoti http://bugs.python.org/issue12924 #12923: test_urllib fails in refleak mode http://bugs.python.org/issue12923 #12922: StringIO and seek() http://bugs.python.org/issue12922 #12921: http.server.BaseHTTPRequestHandler.send_error and trailing new http://bugs.python.org/issue12921 Most recent 15 issues waiting for review (15) ============================================= #12941: add random.pop() http://bugs.python.org/issue12941 #12931: xmlrpclib confuses unicode and string http://bugs.python.org/issue12931 #12930: reindent.py inserts spaces in multiline literals http://bugs.python.org/issue12930 #12924: Missing call to quote_plus() in test_urllib.test_default_quoti http://bugs.python.org/issue12924 #12919: Control what module is imported first http://bugs.python.org/issue12919 #12911: Expose a private accumulator C API http://bugs.python.org/issue12911 #12903: test_io.test_interrupte[r]d* blocks on OpenBSD http://bugs.python.org/issue12903 #12901: Nest class/methods directives in documentation http://bugs.python.org/issue12901 #12890: cgitb displays

tags when executed in text mode http://bugs.python.org/issue12890 #12881: ctypes: segfault with large structure field names http://bugs.python.org/issue12881 #12872: --with-tsc crashes on ppc64 http://bugs.python.org/issue12872 #12857: Expose called function on frame object http://bugs.python.org/issue12857 #12856: tempfile PRNG reuse between parent and child process http://bugs.python.org/issue12856 #12855: linebreak sequences should be better documented http://bugs.python.org/issue12855 #12850: [PATCH] stm.atomic http://bugs.python.org/issue12850 Top 10 most discussed issues (10) ================================= #12905: multiple errors in test_socket on OpenBSD http://bugs.python.org/issue12905 14 msgs #2636: Adding a new regex module (compatible with re) http://bugs.python.org/issue2636 8 msgs #12105: open() does not able to set flags, such as O_CLOEXEC http://bugs.python.org/issue12105 8 msgs #5845: rlcompleter should be enabled automatically http://bugs.python.org/issue5845 6 msgs #12729: Python lib re cannot handle Unicode properly due to narrow/wid http://bugs.python.org/issue12729 6 msgs #5876: __repr__ returning unicode doesn't work when called implicitly http://bugs.python.org/issue5876 5 msgs #7219: Unhelpful error message when a distutils package install fails http://bugs.python.org/issue7219 5 msgs #12870: Regex object should have introspection methods http://bugs.python.org/issue12870 5 msgs #12904: Change os.utime &c functions to use nanosecond precision where http://bugs.python.org/issue12904 5 msgs #12911: Expose a private accumulator C API http://bugs.python.org/issue12911 5 msgs Issues closed (25) ================== #7798: Make generally useful pydoc functions public http://bugs.python.org/issue7798 closed by eric.araujo #8286: distutils: path '[...]' cannot end with '/' -- need better err http://bugs.python.org/issue8286 closed by eric.araujo #10191: scripts files are not RECORDed. http://bugs.python.org/issue10191 closed by eric.araujo #11155: multiprocessing.Queue's put() signature differs from docs http://bugs.python.org/issue11155 closed by python-dev #11561: "coverage" of Python regrtest cannot see initial import of lib http://bugs.python.org/issue11561 closed by brett.cannon #12117: Failures with PYTHONDONTWRITEBYTECODE: test_importlib, test_im http://bugs.python.org/issue12117 closed by eric.araujo #12764: segfault in ctypes.Struct with bad _fields_ http://bugs.python.org/issue12764 closed by meadori #12781: Mention SO_REUSEADDR near socket doc examples http://bugs.python.org/issue12781 closed by sandro.tosi #12840: "maintainer" value clear the "author" value when register http://bugs.python.org/issue12840 closed by eric.araujo #12841: Incorrect tarfile.py extraction http://bugs.python.org/issue12841 closed by lars.gustaebel #12852: POSIX level issues in posixmodule.c on OpenBSD 5.0 http://bugs.python.org/issue12852 closed by haypo #12862: ConfigParser does not implement "comments need to be preceded http://bugs.python.org/issue12862 closed by lukasz.langa #12863: py32 > Lib > xml.minidom > usage feedback > overrides http://bugs.python.org/issue12863 closed by eric.araujo #12871: Disable sched_get_priority_min/max if Python is compiled witho http://bugs.python.org/issue12871 closed by neologix #12878: io.StringIO doesn't provide a __dict__ field http://bugs.python.org/issue12878 closed by python-dev #12888: html.parser.HTMLParser.unescape works only with the first 128 http://bugs.python.org/issue12888 closed by ezio.melotti #12889: struct.pack('d'... problem http://bugs.python.org/issue12889 closed by mark.dickinson #12893: Invitation to connect on LinkedIn http://bugs.python.org/issue12893 closed by nadeem.vawda #12894: pydoc help("modules keyword") is failing when a module throws http://bugs.python.org/issue12894 closed by ned.deily #12898: add opendir() for POSIX platforms http://bugs.python.org/issue12898 closed by haypo #12899: Change os.utimensat() and os.futimens() to use float for atime http://bugs.python.org/issue12899 closed by larry #12906: Slight error in logging module's yaml config http://bugs.python.org/issue12906 closed by python-dev #12909: Inconsistent exception usage in PyLong_As* C functions http://bugs.python.org/issue12909 closed by nadeem.vawda #12928: exec not woking in unittest http://bugs.python.org/issue12928 closed by benjamin.peterson #12929: faulthandler: void pointer used in arithmetic http://bugs.python.org/issue12929 closed by haypo From fwierzbicki at gmail.com Fri Sep 9 18:12:38 2011 From: fwierzbicki at gmail.com (fwierzbicki at gmail.com) Date: Fri, 9 Sep 2011 09:12:38 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com> <201108262337.42349.victor.stinner@haypocalc.com> Message-ID: On Thu, Sep 8, 2011 at 10:39 PM, Terry Reedy wrote: > On 9/8/2011 6:15 PM, fwierzbicki at gmail.com wrote: >> >> Oops, forgot to add the link for the gory details for Java and> ?2 byte >> unicode: >> >> http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ > > This is dated 2004. Basically, they considered several options, tried out 4, > and ended up sticking with char[] (sequences) as UTF-16 with char = 16 bit > code unit and added 32-bit Character(int) class for low-level manipulation > of code points. > > I did not see the indexing problem mentioned. I get the impression that they > encourage sequence forward-backward iteration (cursor-based access) rather > than random-access indexing. Hmmm, sorry for the irrelevant link - my lack of expertise here is showing. What I do know is that we (meaning Jim Baker) are taking great pains to always use codepoints even for random access in our unicode code. I can't speak to the performance implications without some deeper study into what Jim has done. -Frank From solipsis at pitrou.net Fri Sep 9 19:04:32 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 9 Sep 2011 19:04:32 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es> <4E65AD58.6050106@jcea.es> <4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net> <4E681681.6060405@jcea.es> <20110908091805.3f1e9141@pitrou.net> <4E6A2D4A.30503@jcea.es> Message-ID: <20110909190432.11206d07@msiwind> Le Fri, 09 Sep 2011 17:14:18 +0200, Jesus Cea a ?crit : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 08/09/11 09:18, Antoine Pitrou wrote: > > Ok, I've added "-j4", let's how that works. > > It is not helping. it is taking tons of memory yet. That's rather strange. Is it for every test or a few select ones? > >> Another option would be to have a single Python process and > >> "fork" for each test. That would launch each test in a separate > >> process without requiring a full python interpreter launching > >> each time. Is this the way "-j" is implemented > > > > It uses subprocess actually, so fork() + exec() is used. > > Yes, does it but fork for each test or simply launch 4 processes, each > doing 1/4 of the tests?. It forks for each test. Regards Antoine. From tjreedy at udel.edu Fri Sep 9 19:16:17 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 09 Sep 2011 13:16:17 -0400 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com> <201108262337.42349.victor.stinner@haypocalc.com> Message-ID: On 9/9/2011 12:12 PM, fwierzbicki at gmail.com wrote: > On Thu, Sep 8, 2011 at 10:39 PM, Terry Reedy wrote: >> On 9/8/2011 6:15 PM, fwierzbicki at gmail.com wrote: >>> >>> Oops, forgot to add the link for the gory details for Java and> 2 byte >>> unicode: >>> >>> http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ >> >> This is dated 2004. Basically, they considered several options, tried out 4, >> and ended up sticking with char[] (sequences) as UTF-16 with char = 16 bit >> code unit and added 32-bit Character(int) class for low-level manipulation >> of code points. >> >> I did not see the indexing problem mentioned. I get the impression that they >> encourage sequence forward-backward iteration (cursor-based access) rather >> than random-access indexing. > Hmmm, sorry for the irrelevant link - my lack of expertise here is > showing. What I do know is that we (meaning Jim Baker) are taking > great pains to always use codepoints even for random access in our > unicode code. I can't speak to the performance implications without > some deeper study into what Jim has done. I am curious how you index by code point rather than code unit with 16-bit code units and how it compares with the method I posted. Is there anything I can read? Reply off list if you want. -- Terry Jan Reedy From g.brandl at gmx.net Fri Sep 9 21:27:11 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 09 Sep 2011 21:27:11 +0200 Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character In-Reply-To: <4E65D429.3040908@haypocalc.com> References: <4E65D429.3040908@haypocalc.com> Message-ID: Am 06.09.2011 10:04, schrieb Victor Stinner: > Le 06/09/2011 02:25, Nick Coghlan a ?crit : >> On Tue, Sep 6, 2011 at 10:01 AM, victor.stinner >> wrote: >>> Fix also spelling of the null character. >> >> While these cases are legitimately changed to 'null' (since they're >> lowercase descriptions of the character), I figure it's worth >> mentioning again that the ASCII name for '\0' actually *is* NUL (i.e. >> only one 'L'). Strange, but true [1]. >> >> Cheers, >> Nick. >> >> [1] https://secure.wikimedia.org/wikipedia/en/wiki/ASCII > > "NUL" is an abbreviation used in tables when you don't have enough space > to write the full name: "null character". > > Where do you want to mention this abbreviation? I vote to paint the bikeshed BLU. Georg (Seriously, how many more messages will this triviality spawn?) From fwierzbicki at gmail.com Fri Sep 9 21:58:41 2011 From: fwierzbicki at gmail.com (fwierzbicki at gmail.com) Date: Fri, 9 Sep 2011 12:58:41 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com> <201108262337.42349.victor.stinner@haypocalc.com> Message-ID: On Fri, Sep 9, 2011 at 10:16 AM, Terry Reedy wrote: > I am curious how you index by code point rather than code unit with 16-bit > code units and how it compares with the method I posted. Is there anything I > can read? Reply off list if you want. I'll post on-list until someone complains, just in case there are interested onlookers :) There aren't docs, but the code is here: https://bitbucket.org/jython/jython/src/8a8642e45433/src/org/python/core/PyUnicode.java Here are (I think) the most relevant bits for random access -- note that getString() returns the internal representation of the PyUnicode which is a java.lang.String @Override protected PyObject pyget(int i) { if (isBasicPlane()) { return Py.makeCharacter(getString().charAt(i), true); } int k = 0; while (i > 0) { int W1 = getString().charAt(k); if (W1 >= 0xD800 && W1 < 0xDC00) { k += 2; } else { k += 1; } i--; } int codepoint = getString().codePointAt(k); return Py.makeCharacter(codepoint, true); } public boolean isBasicPlane() { if (plane == Plane.BASIC) { return true; } else if (plane == Plane.UNKNOWN) { plane = (getString().length() == getCodePointCount()) ? Plane.BASIC : Plane.ASTRAL; } return plane == Plane.BASIC; } public int getCodePointCount() { if (codePointCount >= 0) { return codePointCount; } codePointCount = getString().codePointCount(0, getString().length()); return codePointCount; } -Frank From guido at python.org Fri Sep 9 23:21:33 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 9 Sep 2011 14:21:33 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com> <201108262337.42349.victor.stinner@haypocalc.com> Message-ID: I, for one, am very interested. It sounds like the 'unicode' datatype in Jython does not in fact have O(1) indexing characteristics if the string contains any characters in the astral plane. Interesting. I wonder if you have heard from anyone about this affecting their app's performance? --Guido On Fri, Sep 9, 2011 at 12:58 PM, fwierzbicki at gmail.com wrote: > On Fri, Sep 9, 2011 at 10:16 AM, Terry Reedy wrote: > >> I am curious how you index by code point rather than code unit with 16-bit >> code units and how it compares with the method I posted. Is there anything I >> can read? Reply off list if you want. > I'll post on-list until someone complains, just in case there are > interested onlookers :) > > There aren't docs, but the code is here: > https://bitbucket.org/jython/jython/src/8a8642e45433/src/org/python/core/PyUnicode.java > > Here are (I think) the most relevant bits for random access -- note > that getString() returns the internal representation of the PyUnicode > which is a java.lang.String > > ? ?@Override > ? ?protected PyObject pyget(int i) { > ? ? ? ?if (isBasicPlane()) { > ? ? ? ? ? ?return Py.makeCharacter(getString().charAt(i), true); > ? ? ? ?} > > ? ? ? ?int k = 0; > ? ? ? ?while (i > 0) { > ? ? ? ? ? ?int W1 = getString().charAt(k); > ? ? ? ? ? ?if (W1 >= 0xD800 && W1 < 0xDC00) { > ? ? ? ? ? ? ? ?k += 2; > ? ? ? ? ? ?} else { > ? ? ? ? ? ? ? ?k += 1; > ? ? ? ? ? ?} > ? ? ? ? ? ?i--; > ? ? ? ?} > ? ? ? ?int codepoint = getString().codePointAt(k); > ? ? ? ?return Py.makeCharacter(codepoint, true); > ? ?} > > ? ?public boolean isBasicPlane() { > ? ? ? ?if (plane == Plane.BASIC) { > ? ? ? ? ? ?return true; > ? ? ? ?} else if (plane == Plane.UNKNOWN) { > ? ? ? ? ? ?plane = (getString().length() == getCodePointCount()) ? > Plane.BASIC : Plane.ASTRAL; > ? ? ? ?} > ? ? ? ?return plane == Plane.BASIC; > ? ?} > > ? ?public int getCodePointCount() { > ? ? ? ?if (codePointCount >= 0) { > ? ? ? ? ? ?return codePointCount; > ? ? ? ?} > ? ? ? ?codePointCount = getString().codePointCount(0, getString().length()); > ? ? ? ?return codePointCount; > ? ?} > > -Frank > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From fwierzbicki at gmail.com Sat Sep 10 00:38:03 2011 From: fwierzbicki at gmail.com (fwierzbicki at gmail.com) Date: Fri, 9 Sep 2011 15:38:03 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com> <201108262337.42349.victor.stinner@haypocalc.com> Message-ID: On Fri, Sep 9, 2011 at 2:21 PM, Guido van Rossum wrote: > I, for one, am very interested. It sounds like the 'unicode' datatype > in Jython does not in fact have O(1) indexing characteristics if the > string contains any characters in the astral plane. Interesting. I > wonder if you have heard from anyone about this affecting their app's > performance? So far we haven't had any complaints - I'm not really sure how often Jython gets used with astral plane characters at this point, but I expect it will happen more in the future, especially once we put together a Jython 3 and Unicode support becomes a stronger expectation. Personally I'm hoping that in that time frame Java will come under pressure to provide a better answer (or we may need to think in the same direction as Dino was thinking in an earlier part of this thread and make a more Python specific String type for Jython....) -Frank From guido at python.org Sat Sep 10 00:43:31 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 9 Sep 2011 15:43:31 -0700 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com> <201108262337.42349.victor.stinner@haypocalc.com> Message-ID: Well, I'd be interesting how it goes, since if Jython users find this acceptable then maybe we shouldn't be quite so concerned about it for CPython... On the third hand we don't have working code for this approach in CPython, while we do have working code for the PEP 393 solution... --Guido On Fri, Sep 9, 2011 at 3:38 PM, fwierzbicki at gmail.com wrote: > On Fri, Sep 9, 2011 at 2:21 PM, Guido van Rossum wrote: >> I, for one, am very interested. It sounds like the 'unicode' datatype >> in Jython does not in fact have O(1) indexing characteristics if the >> string contains any characters in the astral plane. Interesting. I >> wonder if you have heard from anyone about this affecting their app's >> performance? > So far we haven't had any complaints - I'm not really sure how often > Jython gets used with astral plane characters at this point, but I > expect it will happen more in the future, especially once we put > together a Jython 3 and Unicode support becomes a stronger > expectation. Personally I'm hoping that in that time frame Java will > come under pressure to provide a better answer (or we may need to > think in the same direction as Dino was thinking in an earlier part of > this thread and make a more Python specific String type for > Jython....) > > -Frank > -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Sat Sep 10 03:11:18 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 09 Sep 2011 21:11:18 -0400 Subject: [Python-Dev] PEP 393 Summer of Code Project In-Reply-To: References: <6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com> <201108262337.42349.victor.stinner@haypocalc.com> Message-ID: On 9/9/2011 5:21 PM, Guido van Rossum wrote: > I, for one, am very interested. It sounds like the 'unicode' datatype > in Jython does not in fact have O(1) indexing characteristics if the > string contains any characters in the astral plane. Interesting. I > wonder if you have heard from anyone about this affecting their app's > performance? > > --Guido The question is whether or how often any Jython users are yet indexing/slicing long strings with astral chars. If a utf-8 xml file is directly parsed into a DOM, then the longest decoded strings will be 'paragraphs' that are seldom more than 1000 chars. > On Fri, Sep 9, 2011 at 12:58 PM, fwierzbicki at gmail.com > wrote: >> On Fri, Sep 9, 2011 at 10:16 AM, Terry Reedy wrote: >> >>> I am curious how you index by code point rather than code unit with 16-bit >>> code units and how it compares with the method I posted. Is there anything I >>> can read? Reply off list if you want. >> I'll post on-list until someone complains, just in case there are >> interested onlookers :) >> >> There aren't docs, but the code is here: >> https://bitbucket.org/jython/jython/src/8a8642e45433/src/org/python/core/PyUnicode.java >> >> Here are (I think) the most relevant bits for random access -- note >> that getString() returns the internal representation of the PyUnicode >> which is a java.lang.String >> >> @Override >> protected PyObject pyget(int i) { >> if (isBasicPlane()) { >> return Py.makeCharacter(getString().charAt(i), true); >> } This is O(1) >> int k = 0; >> while (i> 0) { >> int W1 = getString().charAt(k); >> if (W1>= 0xD800&& W1< 0xDC00) { >> k += 2; >> } else { >> k += 1; >> } >> i--; This is an O(n) linear scan. >> } >> int codepoint = getString().codePointAt(k); >> return Py.makeCharacter(codepoint, true); >> } Near the beginning of this thread, I described and gave a link to my O(logk) algorithm, where k is the number of supplementary ('astral') chars. It uses bisect.bisect_left on an int array of length k constructed with a linear scan much like the one above, with one added line. The basic idea is to do the linear scan just once and save the locations (code point indexes) of the astral chars instead of repeating the scan on every access. That could be done as the string is constructed. The same array search works for slicing too. Jython is welcome to use it if you ever decide you need it. I have in mind to someday do some timing tests with the Python version. I just do not know how closely results would be to those for compiled C or Java. -- Terry Jan Reedy From jcea at jcea.es Sat Sep 10 05:02:09 2011 From: jcea at jcea.es (Jesus Cea) Date: Sat, 10 Sep 2011 05:02:09 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: <20110909190432.11206d07@msiwind> References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es> <4E65AD58.6050106@jcea.es> <4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net> <4E681681.6060405@jcea.es> <20110908091805.3f1e9141@pitrou.net> <4E6A2D4A.30503@jcea.es> <20110909190432.11206d07@msiwind> Message-ID: <4E6AD331.3080302@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/09/11 19:04, Antoine Pitrou wrote: >> On 08/09/11 09:18, Antoine Pitrou wrote: >>> Ok, I've added "-j4", let's how that works. >> >> It is not helping. it is taking tons of memory yet. > > That's rather strange. Is it for every test or a few select ones? I can't reproduce after stopping the buildbots, delete all its data and restart them. Now I see quite a few python processes running, but memory usage is reasonable. >> Yes, does it but fork for each test or simply launch 4 processes, >> each doing 1/4 of the tests?. > > It forks for each test. So, the memory used should be quite low, then :-). I have committed a few patches in the last hours to get my buildbots "green", back again. The memory used was <500MB, compared with >4GB before the "-j". Could you reconfigure my buildbots to be able to run all the six (2.7, 3.2, 3.x, in 32 and 64 bits) instances at the same time, again?. I have enough resources now. I really sorry to waste your time... Thanks!!!!!. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmrTMZlgi5GaxT1NAQIIKgP+LE1NCfcCVIX+jau4QSJRAVvZan4rqqYn /tMLaz92/toP2S8FdHKbEPs6hBf6QGgnVxnHWcwTxxTWzfDL8xxGjFgJYh/hcqBi B2zfrp83PjW6hFMeL6E7707DI6YwZRCB+dJIiVejAIEMHVOVG6x12KRLFCWL+AOZ ElpXewoATXI= =fHkz -----END PGP SIGNATURE----- From jcea at jcea.es Sat Sep 10 05:26:34 2011 From: jcea at jcea.es (Jesus Cea) Date: Sat, 10 Sep 2011 05:26:34 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot In-Reply-To: <4E6AD331.3080302@jcea.es> References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es> <4E65AD58.6050106@jcea.es> <4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net> <4E681681.6060405@jcea.es> <20110908091805.3f1e9141@pitrou.net> <4E6A2D4A.30503@jcea.es> <20110909190432.11206d07@msiwind> <4E6AD331.3080302@jcea.es> Message-ID: <4E6AD8EA.9010901@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/09/11 05:02, Jesus Cea wrote: > I have committed a few patches in the last hours to get my > buildbots "green", back again. The memory used was <500MB, compared > with >4GB before the "-j". One of my patches solves a "process leak" in multiprocessing, when some tests failed. Doing "make test" leaked quite a few processes, but only in OpenIndiana, where those tests actually failed. That is solved now, both the leak and the test failure. Details: http://bugs.python.org/issue12948 http://bugs.python.org/issue12950 I think the buildbots toke care of this rogue processes after the timeout expires, anyway, but... > Could you reconfigure my buildbots to be able to run all the six > (2.7, 3.2, 3.x, in 32 and 64 bits) instances at the same time, > again?. I have enough resources now. I really sorry to waste your > time... Now, a buildbot run of 3.x compiled in 64bits takes around 500MB. I have seen a peak of around 4GB and a few of around 800MB, for a fraction of a second. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTmrY6plgi5GaxT1NAQKvUgP/YlS7wneU5dsWoAmtqauC02gZUi1D4OpQ 7waM8G1q8OHXLbpV1jKmBb/32G+rDp1Tm/XCjlHpK1wJcmwWmdPGAbbQp1o5TduJ z+lbPnzWvMCRLJwZDtZAitn4/7VchoAcdTfIYCyBoK/JEUI1Oq0Mt5XeIgtD+FX9 IjwuWzXISqM= =ojrq -----END PGP SIGNATURE----- From solipsis at pitrou.net Sat Sep 10 18:46:16 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 10 Sep 2011 18:46:16 +0200 Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es> <20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es> <4E65A13D.9010805@jcea.es> <4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es> <4E65AD58.6050106@jcea.es> <4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net> <4E681681.6060405@jcea.es> <20110908091805.3f1e9141@pitrou.net> <4E6A2D4A.30503@jcea.es> <20110909190432.11206d07@msiwind> <4E6AD331.3080302@jcea.es> Message-ID: <20110910184616.3efc654e@msiwind> Le Sat, 10 Sep 2011 05:02:09 +0200, Jesus Cea a ?crit : > > I have committed a few patches in the last hours to get my buildbots > "green", back again. The memory used was <500MB, compared with >4GB > before the "-j". > > Could you reconfigure my buildbots to be able to run all the six (2.7, > 3.2, 3.x, in 32 and 64 bits) instances at the same time, again?. I > have enough resources now. I really sorry to waste your time... I don't think I can do it right now, since I'm away on holiday. However, perhaps David or Martin can do it. Or you'll have to wait a couple of weeks :) Regards Antoine. From howard_b_golden at yahoo.com Wed Sep 7 21:33:11 2011 From: howard_b_golden at yahoo.com (Howard B. Golden) Date: Wed, 07 Sep 2011 12:33:11 -0700 Subject: [Python-Dev] Handling linker scripts reached when dynamically loading a module Message-ID: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com> Hi, In Haskell I experienced a situation where dynamically loaded modules were experiencing "invalid ELF header" errors. This was caused by the module names actually referring to linker scripts rather than ELF binaries. I patched the GHC runtime system to deal with these scripts. I noticed that this same patch has been ported to Ruby and Node.js, so I suggested to the libc developers that they might wish to incorporate the patch into their library, making it available to all languages. They rejected this suggestion, so I am making the suggestion to the Python devs in case it is of interest to you. Basically, when a linker script is loaded by dlopen, an "invalid ELF header" error occurs. The patch checks to see if the file is a linker script. If so, it finds the name of the real ELF binary with a regular expression and tries to dlopen it. If successful, processing proceeds. Otherwise, the original "invalid ELF error" message is returned. If you want to add this code to Python, you can look at my original patch (http://hackage.haskell.org/trac/ghc/ticket/2615) or the Ruby version (https://github.com/ffi/ffi/pull/117) or the Node.js version (https://github.com/rbranson/node-ffi/pull/5) to help port it. Note that the GHC version in GHC 7.2.1 has been enhanced to also handle another possible error when the linker script is too short, so you might also want to add this enhancement also (see https://github.com/ghc/blob/master/rts/Linker.c line 1191 for the revised regular expression): "(([^ \t()])+\\.so([^ \t:()])*):([ \t])*(invalid ELF header|file too short)" At this point, I don't have the free time to write the Python patch myself, so I apologize in advance for not providing it to you. HTH, Howard B. Golden Northridge, California, USA From guido at python.org Sat Sep 10 23:39:15 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 10 Sep 2011 14:39:15 -0700 Subject: [Python-Dev] Handling linker scripts reached when dynamically loading a module In-Reply-To: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com> References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com> Message-ID: Excuse me for asking a newbie question, but what are linker scripts and why are they important? I don't recall anyone ever having requested this feature before. --Guido On Wed, Sep 7, 2011 at 12:33 PM, Howard B. Golden wrote: > Hi, > > In Haskell I experienced a situation where dynamically loaded modules > were experiencing "invalid ELF header" errors. This was caused by the > module names actually referring to linker scripts rather than ELF > binaries. I patched the GHC runtime system to deal with these scripts. > > I noticed that this same patch has been ported to Ruby and Node.js, so I > suggested to the libc developers that they might wish to incorporate the > patch into their library, making it available to all languages. They > rejected this suggestion, so I am making the suggestion to the Python > devs in case it is of interest to you. > > Basically, when a linker script is loaded by dlopen, an "invalid ELF > header" error occurs. The patch checks to see if the file is a linker > script. If so, it finds the name of the real ELF binary with a regular > expression and tries to dlopen it. If successful, processing proceeds. > Otherwise, the original "invalid ELF error" message is returned. > > If you want to add this code to Python, you can look at my original > patch (http://hackage.haskell.org/trac/ghc/ticket/2615) or the Ruby > version (https://github.com/ffi/ffi/pull/117) or the Node.js version > (https://github.com/rbranson/node-ffi/pull/5) to help port it. > > Note that the GHC version in GHC 7.2.1 has been enhanced to also handle > another possible error when the linker script is too short, so you might > also want to add this enhancement also (see > https://github.com/ghc/blob/master/rts/Linker.c line 1191 for the > revised regular expression): > > "(([^ \t()])+\\.so([^ \t:()])*):([ \t])*(invalid ELF header|file too > short)" > > At this point, I don't have the free time to write the Python patch > myself, so I apologize in advance for not providing it to you. > > HTH, > > Howard B. Golden > Northridge, California, USA > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From howard_b_golden at yahoo.com Sat Sep 10 23:50:42 2011 From: howard_b_golden at yahoo.com (Howard B. Golden) Date: Sat, 10 Sep 2011 14:50:42 -0700 Subject: [Python-Dev] Handling linker scripts reached when dynamically loading a module In-Reply-To: References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com> Message-ID: <1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com> I don't know why, but some Linux distributions place scripts into .so files instead of the actual binaries. This takes advantage of a feature of GNU ld that it will process the script (which points to the actual binary) when it links the .so file. This feature works fine when you are linking a binary, but it doesn't take into account that binaries can be loaded dynamically by interpreters (e.g., Python or GHCi). If dlopen finds a linker script, it doesn't know what to do with it. It simply diagnoses the file as either an invalid ELF header or too short. On Gentoo Linux, some common libraries that are represented as linker scripts include libm.so, libpthread.so and libpcre.so. I know this also affects Ubuntu. Howard On Sat, 2011-09-10 at 14:39 -0700, Guido van Rossum wrote: > Excuse me for asking a newbie question, but what are linker scripts > and why are they important? I don't recall anyone ever having > requested this feature before. > > --Guido > > On Wed, Sep 7, 2011 at 12:33 PM, Howard B. Golden > wrote: > > Hi, > > > > In Haskell I experienced a situation where dynamically loaded modules > > were experiencing "invalid ELF header" errors. This was caused by the > > module names actually referring to linker scripts rather than ELF > > binaries. I patched the GHC runtime system to deal with these scripts. > > > > I noticed that this same patch has been ported to Ruby and Node.js, so I > > suggested to the libc developers that they might wish to incorporate the > > patch into their library, making it available to all languages. They > > rejected this suggestion, so I am making the suggestion to the Python > > devs in case it is of interest to you. > > > > Basically, when a linker script is loaded by dlopen, an "invalid ELF > > header" error occurs. The patch checks to see if the file is a linker > > script. If so, it finds the name of the real ELF binary with a regular > > expression and tries to dlopen it. If successful, processing proceeds. > > Otherwise, the original "invalid ELF error" message is returned. > > > > If you want to add this code to Python, you can look at my original > > patch (http://hackage.haskell.org/trac/ghc/ticket/2615) or the Ruby > > version (https://github.com/ffi/ffi/pull/117) or the Node.js version > > (https://github.com/rbranson/node-ffi/pull/5) to help port it. > > > > Note that the GHC version in GHC 7.2.1 has been enhanced to also handle > > another possible error when the linker script is too short, so you might > > also want to add this enhancement also (see > > https://github.com/ghc/blob/master/rts/Linker.c line 1191 for the > > revised regular expression): > > > > "(([^ \t()])+\\.so([^ \t:()])*):([ \t])*(invalid ELF header|file too > > short)" > > > > At this point, I don't have the free time to write the Python patch > > myself, so I apologize in advance for not providing it to you. > > > > HTH, > > > > Howard B. Golden > > Northridge, California, USA > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev at python.org > > http://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > From guido at python.org Sun Sep 11 00:24:04 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 10 Sep 2011 15:24:04 -0700 Subject: [Python-Dev] Handling linker scripts reached when dynamically loading a module In-Reply-To: <1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com> References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com> <1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com> Message-ID: Odd. Let's see what other core devs say. On Sat, Sep 10, 2011 at 2:50 PM, Howard B. Golden wrote: > I don't know why, but some Linux distributions place scripts into .so > files instead of the actual binaries. This takes advantage of a feature > of GNU ld that it will process the script (which points to the actual > binary) when it links the .so file. > > This feature works fine when you are linking a binary, but it doesn't > take into account that binaries can be loaded dynamically by > interpreters (e.g., Python or GHCi). If dlopen finds a linker script, it > doesn't know what to do with it. It simply diagnoses the file as either > an invalid ELF header or too short. > > On Gentoo Linux, some common libraries that are represented as linker > scripts include libm.so, libpthread.so and libpcre.so. I know this also > affects Ubuntu. > > Howard > > On Sat, 2011-09-10 at 14:39 -0700, Guido van Rossum wrote: >> Excuse me for asking a newbie question, but what are linker scripts >> and why are they important? I don't recall anyone ever having >> requested this feature before. >> >> --Guido >> >> On Wed, Sep 7, 2011 at 12:33 PM, Howard B. Golden >> wrote: >> > Hi, >> > >> > In Haskell I experienced a situation where dynamically loaded modules >> > were experiencing "invalid ELF header" errors. This was caused by the >> > module names actually referring to linker scripts rather than ELF >> > binaries. I patched the GHC runtime system to deal with these scripts. >> > >> > I noticed that this same patch has been ported to Ruby and Node.js, so I >> > suggested to the libc developers that they might wish to incorporate the >> > patch into their library, making it available to all languages. They >> > rejected this suggestion, so I am making the suggestion to the Python >> > devs in case it is of interest to you. >> > >> > Basically, when a linker script is loaded by dlopen, an "invalid ELF >> > header" error occurs. The patch checks to see if the file is a linker >> > script. If so, it finds the name of the real ELF binary with a regular >> > expression and tries to dlopen it. If successful, processing proceeds. >> > Otherwise, the original "invalid ELF error" message is returned. >> > >> > If you want to add this code to Python, you can look at my original >> > patch (http://hackage.haskell.org/trac/ghc/ticket/2615) or the Ruby >> > version (https://github.com/ffi/ffi/pull/117) or the Node.js version >> > (https://github.com/rbranson/node-ffi/pull/5) to help port it. >> > >> > Note that the GHC version in GHC 7.2.1 has been enhanced to also handle >> > another possible error when the linker script is too short, so you might >> > also want to add this enhancement also (see >> > https://github.com/ghc/blob/master/rts/Linker.c line 1191 for the >> > revised regular expression): >> > >> > "(([^ \t()])+\\.so([^ \t:()])*):([ \t])*(invalid ELF header|file too >> > short)" >> > >> > At this point, I don't have the free time to write the Python patch >> > myself, so I apologize in advance for not providing it to you. >> > >> > HTH, >> > >> > Howard B. Golden >> > Northridge, California, USA >> > >> > _______________________________________________ >> > Python-Dev mailing list >> > Python-Dev at python.org >> > http://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >> > >> >> >> > > > -- --Guido van Rossum (python.org/~guido) From nadeem.vawda at gmail.com Sun Sep 11 00:39:02 2011 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Sun, 11 Sep 2011 00:39:02 +0200 Subject: [Python-Dev] Handling linker scripts reached when dynamically loading a module In-Reply-To: References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com> <1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com> Message-ID: I can confirm that libpthread.so (/usr/lib/x86_64-linux-gnu/libpthread.so) is a linker script on my Ubuntu 11.04 install. This hasn't ever caused me any problems, though. As for why distributions do this, here are the contents of the script: /* GNU ld script Use the shared library, but some functions are only in the static library, so try that secondarily. */ OUTPUT_FORMAT(elf64-x86-64) GROUP ( /lib/x86_64-linux-gnu/libpthread.so.0 /usr/lib/x86_64-linux-gnu/libpthread_nonshared.a ) Cheers, Nadeem On Sun, Sep 11, 2011 at 12:24 AM, Guido van Rossum wrote: > Odd. Let's see what other core devs say. > > On Sat, Sep 10, 2011 at 2:50 PM, Howard B. Golden > wrote: >> I don't know why, but some Linux distributions place scripts into .so >> files instead of the actual binaries. This takes advantage of a feature >> of GNU ld that it will process the script (which points to the actual >> binary) when it links the .so file. >> >> This feature works fine when you are linking a binary, but it doesn't >> take into account that binaries can be loaded dynamically by >> interpreters (e.g., Python or GHCi). If dlopen finds a linker script, it >> doesn't know what to do with it. It simply diagnoses the file as either >> an invalid ELF header or too short. >> >> On Gentoo Linux, some common libraries that are represented as linker >> scripts include libm.so, libpthread.so and libpcre.so. I know this also >> affects Ubuntu. >> >> Howard >> >> On Sat, 2011-09-10 at 14:39 -0700, Guido van Rossum wrote: >>> Excuse me for asking a newbie question, but what are linker scripts >>> and why are they important? I don't recall anyone ever having >>> requested this feature before. >>> >>> --Guido >>> >>> On Wed, Sep 7, 2011 at 12:33 PM, Howard B. Golden >>> wrote: >>> > Hi, >>> > >>> > In Haskell I experienced a situation where dynamically loaded modules >>> > were experiencing "invalid ELF header" errors. This was caused by the >>> > module names actually referring to linker scripts rather than ELF >>> > binaries. I patched the GHC runtime system to deal with these scripts. >>> > >>> > I noticed that this same patch has been ported to Ruby and Node.js, so I >>> > suggested to the libc developers that they might wish to incorporate the >>> > patch into their library, making it available to all languages. They >>> > rejected this suggestion, so I am making the suggestion to the Python >>> > devs in case it is of interest to you. >>> > >>> > Basically, when a linker script is loaded by dlopen, an "invalid ELF >>> > header" error occurs. The patch checks to see if the file is a linker >>> > script. If so, it finds the name of the real ELF binary with a regular >>> > expression and tries to dlopen it. If successful, processing proceeds. >>> > Otherwise, the original "invalid ELF error" message is returned. >>> > >>> > If you want to add this code to Python, you can look at my original >>> > patch (http://hackage.haskell.org/trac/ghc/ticket/2615) or the Ruby >>> > version (https://github.com/ffi/ffi/pull/117) or the Node.js version >>> > (https://github.com/rbranson/node-ffi/pull/5) to help port it. >>> > >>> > Note that the GHC version in GHC 7.2.1 has been enhanced to also handle >>> > another possible error when the linker script is too short, so you might >>> > also want to add this enhancement also (see >>> > https://github.com/ghc/blob/master/rts/Linker.c line 1191 for the >>> > revised regular expression): >>> > >>> > "(([^ \t()])+\\.so([^ \t:()])*):([ \t])*(invalid ELF header|file too >>> > short)" >>> > >>> > At this point, I don't have the free time to write the Python patch >>> > myself, so I apologize in advance for not providing it to you. >>> > >>> > HTH, >>> > >>> > Howard B. Golden >>> > Northridge, California, USA >>> > >>> > _______________________________________________ >>> > Python-Dev mailing list >>> > Python-Dev at python.org >>> > http://mail.python.org/mailman/listinfo/python-dev >>> > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >>> > >>> >>> >>> >> >> >> > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/nadeem.vawda%40gmail.com > From howard_b_golden at yahoo.com Sun Sep 11 01:35:19 2011 From: howard_b_golden at yahoo.com (Howard B. Golden) Date: Sat, 10 Sep 2011 16:35:19 -0700 Subject: [Python-Dev] Handling linker scripts reached when dynamically loading a module In-Reply-To: References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com> <1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com> Message-ID: <1315697719.16652.15.camel@www.hbg-srv3.hgolden.socal.rr.com> On Sun, 2011-09-11 at 00:39 +0200, Nadeem Vawda wrote: > I can confirm that libpthread.so (/usr/lib/x86_64-linux-gnu/libpthread.so) > is a linker script on my Ubuntu 11.04 install. This hasn't ever caused me > any problems, though. > > As for why distributions do this, here are the contents of the script: > > /* GNU ld script > Use the shared library, but some functions are only in > the static library, so try that secondarily. */ > OUTPUT_FORMAT(elf64-x86-64) > GROUP ( /lib/x86_64-linux-gnu/libpthread.so.0 > /usr/lib/x86_64-linux-gnu/libpthread_nonshared.a ) > > Cheers, > Nadeem Let me clarify: This will only be a problem when using a foreign function interface to call a non-versioned module dynamically. In the more common situation, when one links to a package specified at link time, the linker figures out the specific, versioned name of the .so file and then the dlopen will refer to the actual binary. So, in Python, this is likely to only affect users calling packages using ctypes. (This corresponds to GHCi loading an unversioned library, e.g., "ghci -lm" which would load the current version of the math library into the GHC interpreter.) Howard From wolfson at gmail.com Sun Sep 11 01:52:07 2011 From: wolfson at gmail.com (Ben Wolfson) Date: Sat, 10 Sep 2011 16:52:07 -0700 Subject: [Python-Dev] Handling linker scripts reached when dynamically loading a module In-Reply-To: <1315697719.16652.15.camel@www.hbg-srv3.hgolden.socal.rr.com> References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com> <1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com> <1315697719.16652.15.camel@www.hbg-srv3.hgolden.socal.rr.com> Message-ID: On Sat, Sep 10, 2011 at 4:35 PM, Howard B. Golden wrote: > > So, in Python, this is likely to only affect users calling packages > using ctypes. (This corresponds to GHCi loading an unversioned library, > e.g., "ghci -lm" which would load the current version of the math > library into the GHC interpreter.) And it does do so on Gentoo: $ python Python 2.6.6 (r266:84292, Dec 26 2010, 17:43:52) [GCC 4.4.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from ctypes import cdll >>> cdll.LoadLibrary('libpthread.so') Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.6/ctypes/__init__.py", line 431, in LoadLibrary return self._dlltype(name) File "/usr/lib/python2.6/ctypes/__init__.py", line 353, in __init__ self._handle = _dlopen(self._name, mode) OSError: /usr/lib/libpthread.so: invalid ELF header >>> cdll.LoadLibrary('libpthread.so.0') >>> $ cat /usr/lib/libpthread.so /* GNU ld script Use the shared library, but some functions are only in the static library, so try that secondarily. */ OUTPUT_FORMAT(elf32-i386) GROUP ( /lib/libpthread.so.0 /usr/lib/libpthread_nonshared.a ) -- Ben Wolfson "Human kind has used its intelligence to vary the flavour of drinks, which may be sweet, aromatic, fermented or spirit-based. ... Family and social life also offer numerous other occasions to consume drinks for pleasure." [Larousse, "Drink" entry] From martin at v.loewis.de Sun Sep 11 09:08:29 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 11 Sep 2011 09:08:29 +0200 Subject: [Python-Dev] Handling linker scripts reached when dynamically loading a module In-Reply-To: <1315697719.16652.15.camel@www.hbg-srv3.hgolden.socal.rr.com> References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com> <1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com> <1315697719.16652.15.camel@www.hbg-srv3.hgolden.socal.rr.com> Message-ID: <4E6C5E6D.5000502@v.loewis.de> > Let me clarify: This will only be a problem when using a foreign > function interface to call a non-versioned module dynamically. As such, it won't be much of a problem for Python. In Python, we don't normally dlopen .so files, except when we know they are Python extension modules, in which case we also know that they won't be linker scripts - it just doesn't make sense to write a linker script for what should be a Python module, since you won't ever link against Python modules. The only case where it might matter is ctypes, which is Python's "dynamic" FFI (as opposed to the C API, which is the "static" FFI). However, those libraries which are often wrapped with linker scripts don't typically get used in ctypes - e.g. libpthread won't be used in ctypes, but along with the thread module. The only common case where a library that is often a linker script gets also often used in ctypes (i.e. libc) is already special-cased - ctypes knows how to find the "real" C library. IOW, I would defer this until it becomes a real problem, at what point whoever has that problem ought to provide a patch. Regards, Martin From fuzzyman at voidspace.org.uk Sun Sep 11 20:49:06 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 11 Sep 2011 19:49:06 +0100 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <87pqjbhht9.fsf@uwakimon.sk.tsukuba.ac.jp> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> <20110907030758.58caa4ed@pitrou.net> <8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp> <20110907144749.7c1a9d50@pitrou.net> <87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp> <4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com> <87pqjbhht9.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4E6D02A2.7040907@voidspace.org.uk> On 08/09/2011 03:46, Stephen J. Turnbull wrote: > Glyph Lefkowitz writes: > > On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote: > > > > > How about "title"? > > > > >>> 'content-length'.title() > > 'Content-Length' > > Does anyone *actually* use .title() for this? (And why not just use the correct casing in the string literal...) Michael > > You might say that the protocol "has" to be case-insensitive so > > this is a silly frill: > > Not me, sir. My whole point about the "bytes should be more like str" > controversy is the dual of that: you don't know what will be coming at > you, so the regularities and (normally allowable) fuzziness of text > processing are inadmissible. > > > there are definitely enough case-sensitive crappy bits of network > > middleware out there that this function is critically important for > > an HTTP server. > > "Critically important" is surely an overstatement. You could always > title-case the literal strings containing field names in the source. > > The problem with having lots of str-like features on bytes is that you > lose TOOWDTI, or worse, to many performance-happy coders, use of bytes > becomes TOOWDTI "because none of the characters[sic] I'm planning to > process myself are non-ASCII". This is the road to Babel; it's > workable for one-off scripts but it's asking for long-term trouble in > multi-module applications. The choice of decoding to str and > processing in that form should be made as attractive as possible. > > On the other hand, it is undeniably useful for protocol tokens to have > mnemonic representations even in binary protocols. Textual > manipulations on those tokens should be convenient. > > It seems to me that what might be an improvement over the current > situation (maybe for Py4k only, though) is for bytes and > (PEP-393-style) str to share representation, and have a "cast" method > which would convert from one to the other, validating that the range > contraints on the representation are satisfied. The problem I see is > that this either sanctions the practice of using latin-1 as "ASCII > plus anything", which is an unpleasant hack, or you'd need to check in > text methods that nothing is done with non-ASCII values other than > checks for set membership (including equality comparison, of course). > > OTOH, AFAICS, Antoine's claim that inserting a non-latin-1 character > in a str that happens to contain only ASCII values would convert the > representation to multioctets (true), and therefore this doesn't give > the desired efficiency properties, is beside the point. Just don't do > that! You *can't* do that in a bytes object, anyway; use of str in > this way is a "consenting adults" issue. You trade off the > convenience of the full suite of text tools vs. the possibility that > somebody might insert such a character -- but for the algorithms > they're going to be using, they shouldn't be doing that anyway. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From nadeem.vawda at gmail.com Sun Sep 11 23:30:44 2011 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Sun, 11 Sep 2011 23:30:44 +0200 Subject: [Python-Dev] LZMA compression support in 3.3 In-Reply-To: References: <4E59041A.7040100@v.loewis.de> <4E5909FD.7060809@v.loewis.de> <20110827174057.6c4b619e@pitrou.net> <20110829083029.68faa57b@resist.wooz.org> Message-ID: I've posted an updated patch to the bug tracker, with a complete implementation of the lzma module, including 100% test coverage for the LZMAFile class (which is implemented entirely in Python). It doesn't include ReST documentation (yet), but the docstrings are quite detailed. Please take a look and let me know what you think. Cheers, Nadeem From glyph at twistedmatrix.com Mon Sep 12 02:22:15 2011 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Sun, 11 Sep 2011 17:22:15 -0700 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <4E6D02A2.7040907@voidspace.org.uk> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk> <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info> <4E668029.6080106@v.loewis.de> <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk> <4E66845F.3060708@v.loewis.de> <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com> <20110907030758.58caa4ed@pitrou.net> <8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp> <20110907144749.7c1a9d50@pitrou.net> <87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp> <4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com> <87pqjbhht9.fsf@uwakimon.sk.tsukuba.ac.jp> <4E6D02A2.7040907@voidspace.org.uk> Message-ID: <87FDD2A2-4D24-408C-AF0A-9A359D9B775E@twistedmatrix.com> On Sep 11, 2011, at 11:49 AM, Michael Foord wrote: > Does anyone *actually* use .title() for this? (And why not just use the correct casing in the string literal...) Yes. Twisted does, in various MIME-ish places (IMAP, SIP), although not in HTTP from what I can see. I imagine other similar software would as well. One issue is that you don't always have a string literal to work with. If you're proxying traffic, you start from a mis-cased header and you possibly need to correct it to a canonically-cased one. (On at least one occasion I've had to use such a proxy to make certain buggy client software work.) Of course you could have something like {b"CONNECTION-LOST": b"Connection-Lost", ...} somewhere at module scope, but that feels a bit sillier than just having a nice '.title()' method. -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From fumanchu at aminus.org Mon Sep 12 17:59:58 2011 From: fumanchu at aminus.org (Robert Brewer) Date: Mon, 12 Sep 2011 08:59:58 -0700 Subject: [Python-Dev] Maintenance burden of str.swapcase In-Reply-To: <87FDD2A2-4D24-408C-AF0A-9A359D9B775E@twistedmatrix.com> References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk><87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp> <4E66763B.7080707@pearwood.info><4E668029.6080106@v.loewis.de><7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk><4E66845F.3060708@v.loewis.de><4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com><20110907030758.58caa4ed@pitrou.net><8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp><20110907144749.7c1a9d50@pitrou.net><87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp><4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com><87pqjbhht9.fsf@uwakimon.sk.tsukuba.ac.jp><4E6D02A2.7040907@voidspace.org.uk> <87FDD2A2-4D24-408C-AF0A-9A359D9B775E@twistedmatrix.com> Message-ID: Glyph Lefkowitz wrote: > On Sep 11, 2011, at 11:49 AM, Michael Foord wrote: > Does anyone *actually* use .title() for this? > > Yes. ?Twisted does, in various MIME-ish places (IMAP, SIP), > although not in HTTP from what I can see. ?I imagine other > similar software would as well. Not to mention it doesn't work for WWW-Authenticate or TE, to give just a couple of examples. Robert Brewer fumanchu at aminus.org From merwok at netwok.org Tue Sep 13 17:57:31 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Tue, 13 Sep 2011 17:57:31 +0200 Subject: [Python-Dev] Packaging in Python 2 anyone ? In-Reply-To: References: <4E4D5992.7070603@netwok.org> Message-ID: <4E6F7D6B.9040709@netwok.org> Hi, Here?s a status update on distutils2. Vinay did the bulk of the work in his initial commit; we just had to re-add some mistakenly deleted helpers in d2.tests and d2.tests.support, change sysconfig imports and remove duplicate files (sysconfig.*). A contributor did a huge commit to restore 2.4 compatibility. I pulled it, because it was a useful contribution, and am now in the middle of cleaning it: some conversions were not idiomatic or even buggy, just like when we converted from 2.x to 3.x. Alexis and I have been working in parallel, with some unfortunate duplication. We?ve resolved to use the tracker or email to coordinate. When I am finished cleaning up the 2.4 compat changes, I?ll backport all outstanding changesets that were done in packaging, and then I?ll try to fix the few (on linux3^Wlinux) test failures. When the d2 codebase matches packaging's again, it will be easy to keep both codebases in sync. I will edit the wiki page about contributing to state that I will accept patches made against d2 instead of packaging, to lower the contribution bar. It would be very useful to have buildbots. A question: What about distutils2 for Python 3.x? I think we could keep the stdlib codebase compatible with 3.1 and use a semi-automated process to extract cpython3.3/Lib/packaging to distutils2-py3/distutils2 and rename imports. (IIRC PyPI will require us to play games to have both 2.x and 3.x versions of distutils2.) Another question: What about the docs? Can we just point people to docs.python.org and tell them to mentally replace packaging with distutils2? If that is judged unacceptable, then I?ll synchronize the docs in the d2 repo, but that?s hours I won?t spend on bugs or features. Cheers From fuzzyman at voidspace.org.uk Tue Sep 13 18:34:39 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 13 Sep 2011 17:34:39 +0100 Subject: [Python-Dev] Packaging in Python 2 anyone ? In-Reply-To: <4E6F7D6B.9040709@netwok.org> References: <4E4D5992.7070603@netwok.org> <4E6F7D6B.9040709@netwok.org> Message-ID: <4E6F861F.5020904@voidspace.org.uk> On 13/09/2011 16:57, ?ric Araujo wrote: > [snip...] > A question: What about distutils2 for Python 3.x? I think we could keep > the stdlib codebase compatible with 3.1 and use a semi-automated process > to extract cpython3.3/Lib/packaging to distutils2-py3/distutils2 and > rename imports. (IIRC PyPI will require us to play games to have both > 2.x and 3.x versions of distutils2.) What I'm doing for unittest2. 1) I have a script that pulls unittest from mercurial head and then applies patches to it to make it compatible with Python 3.1 - 3.2 and rename it from unittest to unittest2 2) I have a pypi project called unittestpy3k that holds the Python 3 version of unittest2 Projects using unittest2 for Python 3 then have a dependency on unittest2py3k - but the actual Python package name is unittest2. I only need to maintain a set of patches against unittest on head, rather than a whole branch. This works pretty well. All the best, Michael Foord > Another question: What about the docs? Can we just point people to > docs.python.org and tell them to mentally replace packaging with > distutils2? If that is judged unacceptable, then I?ll synchronize the > docs in the d2 repo, but that?s hours I won?t spend on bugs or features. > > Cheers > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From drkjam at gmail.com Wed Sep 14 01:09:11 2011 From: drkjam at gmail.com (DrKJam) Date: Wed, 14 Sep 2011 00:09:11 +0100 Subject: [Python-Dev] PyPI trove classifiers for alternate language implementations Message-ID: Would it be possible to have trove classifiers added to PyPI specifically for PyPy and possibly also Jython and IronPython? Regards, David Moss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Sep 14 01:18:12 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 14 Sep 2011 09:18:12 +1000 Subject: [Python-Dev] PyPI trove classifiers for alternate language implementations In-Reply-To: References: Message-ID: On Wed, Sep 14, 2011 at 9:09 AM, DrKJam wrote: > Would it be possible to have trove classifiers added to PyPI specifically > for PyPy and possibly also Jython and IronPython? Possibly, but the place to ask would be catalog-sig at python.org Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ausfdes at gmail.com Wed Sep 14 13:13:15 2011 From: ausfdes at gmail.com (Austin Fernandes) Date: Wed, 14 Sep 2011 16:43:15 +0530 Subject: [Python-Dev] Windows 8 support Message-ID: Hi, Which versions of python will be compatible with windows8. I am using currently 2.7.2 version. Thanks, Austin -------------- next part -------------- An HTML attachment was scrubbed... URL: From nyamatongwe at gmail.com Wed Sep 14 13:38:36 2011 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Wed, 14 Sep 2011 21:38:36 +1000 Subject: [Python-Dev] Windows 8 support In-Reply-To: References: Message-ID: Austin Fernandes: > Which versions of python will be compatible with windows8. I am using > currently 2.7.2 version. Current releases of both Python 2.7 and Python 3.2 appear to run fine on the Windows 8 Developer Preview. You should download and install the preview to ensure that your own code is compatible. Neil From jdhardy at gmail.com Wed Sep 14 18:23:27 2011 From: jdhardy at gmail.com (Jeff Hardy) Date: Wed, 14 Sep 2011 09:23:27 -0700 Subject: [Python-Dev] Windows 8 support In-Reply-To: References: Message-ID: On Wed, Sep 14, 2011 at 4:38 AM, Neil Hodgson wrote: > Austin Fernandes: > >> Which versions of python will be compatible with windows8. I am using >> currently 2.7.2 version. > > ? Current releases of both Python 2.7 and Python 3.2 appear to run > fine on the Windows 8 Developer Preview. You should download and > install the preview to ensure that your own code is compatible. Another question is whether Python can take advantage of WinRT (the new UI framework). It should be possible, as the new APIs were designed to be used? from dynamic languages, but I haven't decided if I'm crazy enough to try it. - Jeff From martin at v.loewis.de Wed Sep 14 22:41:49 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 14 Sep 2011 22:41:49 +0200 Subject: [Python-Dev] Windows 8 support In-Reply-To: References: Message-ID: <4E71118D.9090305@v.loewis.de> > Another question is whether Python can take advantage of WinRT (the > new UI framework). It should be possible, as the new APIs were > designed to be used from dynamic languages, but I haven't decided if > I'm crazy enough to try it. Python doesn't do GUI on its own, so the direct answer to this question is "no, it can't take advantage of WinRT". Of course, people might start writing Python wrappers for WinRT, possibly leading to a PyRT package. Alternatively, wxWindows might start using WinRT, which would automatically expose it to wxPython applications. Likewise, Tk might integrate support for WinRT, in which case IDLE might make use of it out of the box. Regards, Martin From martin at v.loewis.de Wed Sep 14 22:53:19 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 14 Sep 2011 22:53:19 +0200 Subject: [Python-Dev] Windows 8 support In-Reply-To: References: Message-ID: <4E71143F.4000107@v.loewis.de> > Which versions of python will be compatible with windows8. I am using > currently 2.7.2 version. Most likely, all versions back to Python 1.1 or so will be compatible with Windows 8 (when 32-bit Windows support was first added to Python). Python uses very little of the Windows API (compared to, say, a game). Microsoft isn't going to break any of this for the next decade. Support for 16-bit applications is being dropped, but Python didn't really support 16-bit Windows all that well (although there was a DOS port). Regards, Martin From greg.ewing at canterbury.ac.nz Thu Sep 15 00:52:51 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 15 Sep 2011 10:52:51 +1200 Subject: [Python-Dev] Windows 8 support In-Reply-To: References: Message-ID: <4E713043.100@canterbury.ac.nz> Jeff Hardy wrote: > Another question is whether Python can take advantage of WinRT (the > new UI framework). It should be possible, as the new APIs were > designed to be used? from dynamic languages, but I haven't decided if > I'm crazy enough to try it. WinRT certainly sounds like the way to go in the future. I'm glad to hear that .NET isn't going to take over the world after all! -- Greg From eliben at gmail.com Thu Sep 15 08:53:19 2011 From: eliben at gmail.com (Eli Bendersky) Date: Thu, 15 Sep 2011 09:53:19 +0300 Subject: [Python-Dev] Windows 8 support In-Reply-To: <4E713043.100@canterbury.ac.nz> References: <4E713043.100@canterbury.ac.nz> Message-ID: > Another question is whether Python can take advantage of WinRT (the >> new UI framework). It should be possible, as the new APIs were >> designed to be used? from dynamic languages, but I haven't decided if >> I'm crazy enough to try it. >> > > WinRT certainly sounds like the way to go in the future. > I'm glad to hear that .NET isn't going to take over the > world after all! > I'm not sure whether I prefer Javascript doing that, though :) Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From jai.unix at gmail.com Thu Sep 15 08:56:54 2011 From: jai.unix at gmail.com (Jai Sharma) Date: Thu, 15 Sep 2011 12:26:54 +0530 Subject: [Python-Dev] Not able to do unregister a code Message-ID: Hi, I am facing a memory leaking issue with codecs. I make my own ABC class and register it with codes. import codecs codecs.register(ABC) but I am not able to remove ABC from memory. Is there any alternative to do that. Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From jai.unix at gmail.com Thu Sep 15 09:03:47 2011 From: jai.unix at gmail.com (Jai Sharma) Date: Thu, 15 Sep 2011 12:33:47 +0530 Subject: [Python-Dev] Not able to do unregister a code In-Reply-To: References: Message-ID: Below is reference pattern: 0: _ --- [-] 4 : 0xa70ca44, 0xa70e79c, 0xe5c602c, 0xe6219bc 1: a [-] 4 tuple: 0xab11c5c*3, 0xe72a43c*3, 0xe73c16c*3, 0xe73c1bc*3 2: aa ---- [-] 4 function: ABC.l_codecs.decode... 3: a3 [S] 2 dict of class: ..Codec, ..Codec 4: aab ---- [-] 4 types.MethodType: wrote: > Hi, > > I am facing a memory leaking issue with codecs. I make my own ABC class and > register it with codes. > > import codecs > codecs.register(ABC) > > but I am not able to remove ABC from memory. Is there any alternative to do > that. > > Thanks > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Sep 15 09:34:20 2011 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 15 Sep 2011 09:34:20 +0200 Subject: [Python-Dev] Not able to do unregister a code In-Reply-To: References: Message-ID: <4E71AA7C.9090403@egenix.com> Jai Sharma wrote: > Hi, > > I am facing a memory leaking issue with codecs. I make my own ABC class and > register it with codes. > > import codecs > codecs.register(ABC) > > but I am not able to remove ABC from memory. Is there any alternative to do > that. The ABC codec search function gets added to the codec registry search path list which currently cannot be accessed directly. There is no API to unregister a codec search function, since deregistration would break the codec cache used by the registry to speedup codec lookup. Why would you want to unregister a codec search function ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 15 2011) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2011-10-04: PyCon DE 2011, Leipzig, Germany 19 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From nadeem.vawda at gmail.com Thu Sep 15 11:37:15 2011 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Thu, 15 Sep 2011 11:37:15 +0200 Subject: [Python-Dev] LZMA compression support in 3.3 In-Reply-To: References: <4E59041A.7040100@v.loewis.de> <4E5909FD.7060809@v.loewis.de> <20110827174057.6c4b619e@pitrou.net> <20110829083029.68faa57b@resist.wooz.org> Message-ID: Another update - I've added proper documentation. Now the code should be pretty much complete - all that's missing is the necessary bits and pieces to build it on Windows. Cheers, Nadeem From jcea at jcea.es Thu Sep 15 13:29:27 2011 From: jcea at jcea.es (Jesus Cea) Date: Thu, 15 Sep 2011 13:29:27 +0200 Subject: [Python-Dev] Do we have interest in a clang buildbot? Message-ID: <4E71E197.7000006@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, pals. I am seeing a few commits related to clang (a C compiler, alternative to GCC), but we ?only? have a buildbot using clang as the compiler. If there is interest, I would deploy 32 and 64 bits buildbots under my current OpenIndiana buildbot. What do you think? - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTnHhl5lgi5GaxT1NAQKlJgQApAwlZODoeG3G+HODkoSh6G5myqEXkS/0 YZM6wo+/uWb6ul50Kb9mWhucGhY1tc8wAxCDNsRcm8Vv/6sDLZOV0G++DIK0JXIw BA8TyF/5CI8c5K3wnrVkazTo/Io1kVYMGc1FekIoQFI3oRKdXs/A6h63XWwxDMNu PsGwVD4bizs= =lJ/r -----END PGP SIGNATURE----- From martin at v.loewis.de Thu Sep 15 17:31:34 2011 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 15 Sep 2011 17:31:34 +0200 Subject: [Python-Dev] PEP 393: Porting Guidelines Message-ID: <4E721A56.1000900@v.loewis.de> I added a section on porting guidelines to the PEP, resulting from my own porting experience. Please review. http://www.python.org/dev/peps/pep-0393/#porting-guidelines Regards, Martin From martin at v.loewis.de Thu Sep 15 17:50:41 2011 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 15 Sep 2011 17:50:41 +0200 Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings Message-ID: <4E721ED1.1000001@v.loewis.de> In reviewing memory usage, I found potential for saving more memory for ASCII-only strings. Both Victor and Guido commented that something like this be done; Antoine had asked whether there was anything that could be done. Here is the idea: In an ASCII-only string, the UTF-8 representation is shared with the canonical one-byte representation. This would allow to drop the UTF-8 pointer and the UTF-8 length field; instead, a flag in the state would indicate that these fields are not there. Likewise, the wchar_t/Py_UNICODE length can be shared (even though the data cannot), since the ASCII-only string won't contain any surrogate pairs. To comply with the C aliasing rules, the structures would look like this: typedef struct { PyObject_HEAD Py_ssize_t length; union { void *any; Py_UCS1 *latin1; Py_UCS2 *ucs2; Py_UCS4 *ucs4; } data; Py_hash_t hash; int state; /* may include SSTATE_SHORT_ASCII flag */ wchar_t *wstr; } PyASCIIObject; typedef struct { PyASCIIObject _base; Py_ssize_t utf8_length; char *utf8; Py_ssize_t wstr_length; } PyUnicodeObject; Code that directly accesses the structures would become more complex; code that use the accessor macros wouldn't notice. As a result, ASCII-only strings would lose three pointers, and shrink to their 3.2 structure size. Since they also save in the individual characters, strings with more than 3 characters (16-bit Py_UNICODE) or more than one character (32-bit Py_UNICODE) would see a total size reduction compared to 3.2. Objects created throught the legacy API (PyUnicode_FromUnicode) that are only later found to be ASCII-only (in PyUnicode_Ready) would still have the UTF-8 pointer shared with the data pointer, but keep including separate fields for pointer & size. What do you think? Regards, Martin P.S. There are similar reductions that could be applied to the wstr_length in general: on 32-bit wchar_t systems, it could be always dropped, on a 16-bit wchar_t system, it could be dropped for UCS-2 strings. However, I'm not proposing these, as I think the increase in complexity is not worth the savings. From merwok at netwok.org Thu Sep 15 18:23:11 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 15 Sep 2011 18:23:11 +0200 Subject: [Python-Dev] Packaging in Python 2 anyone ? In-Reply-To: <4E6F861F.5020904@voidspace.org.uk> References: <4E4D5992.7070603@netwok.org> <4E6F7D6B.9040709@netwok.org> <4E6F861F.5020904@voidspace.org.uk> Message-ID: <4E72266F.106@netwok.org> Le 13/09/2011 18:34, Michael Foord a ?crit : > On 13/09/2011 16:57, ?ric Araujo wrote: >> (IIRC PyPI will require us to play games to have both >> 2.x and 3.x versions of distutils2.) > > What I'm doing for unittest2. > [...] > 2) I have a pypi project called unittestpy3k that holds the Python 3 > version of unittest2 > > Projects using unittest2 for Python 3 then have a dependency on > unittest2py3k - but the actual Python package name is unittest2. That?s what I call playing games. I think it would make more sense to push 2.x-compatible and 3.x-compatible sdists to PyPI (with an appropriate 'Programming Language :: Python :: 2' or '3' classifier) and have the download tools be smart. Regards From fdrake at acm.org Thu Sep 15 19:08:34 2011 From: fdrake at acm.org (Fred Drake) Date: Thu, 15 Sep 2011 13:08:34 -0400 Subject: [Python-Dev] Packaging in Python 2 anyone ? In-Reply-To: <4E72266F.106@netwok.org> References: <4E4D5992.7070603@netwok.org> <4E6F7D6B.9040709@netwok.org> <4E6F861F.5020904@voidspace.org.uk> <4E72266F.106@netwok.org> Message-ID: On Thu, Sep 15, 2011 at 12:23 PM, ?ric Araujo wrote: >?I think it would make more sense to > push 2.x-compatible and 3.x-compatible sdists to PyPI (with an > appropriate 'Programming Language :: Python :: 2' or '3' classifier) and > have the download tools be smart. FWIW, I prefer this as well. I'd certainly appreciate the option to do it this way. -Fred -- Fred L. Drake, Jr.? ? "A person who won't read has no advantage over one who can't read." ?? --Samuel Langhorne Clemens From ubershmekel at gmail.com Thu Sep 15 19:50:06 2011 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 15 Sep 2011 20:50:06 +0300 Subject: [Python-Dev] Packaging in Python 2 anyone ? In-Reply-To: <4E72266F.106@netwok.org> References: <4E4D5992.7070603@netwok.org> <4E6F7D6B.9040709@netwok.org> <4E6F861F.5020904@voidspace.org.uk> <4E72266F.106@netwok.org> Message-ID: +2 for promoting naming consistency and putting metadata where it's supposed to be. --Yuval On Sep 15, 2011 9:23 AM, "?ric Araujo" wrote: > Le 13/09/2011 18:34, Michael Foord a ?crit : >> On 13/09/2011 16:57, ?ric Araujo wrote: >>> (IIRC PyPI will require us to play games to have both >>> 2.x and 3.x versions of distutils2.) >> >> What I'm doing for unittest2. >> [...] >> 2) I have a pypi project called unittestpy3k that holds the Python 3 >> version of unittest2 >> >> Projects using unittest2 for Python 3 then have a dependency on >> unittest2py3k - but the actual Python package name is unittest2. > > That?s what I call playing games. I think it would make more sense to > push 2.x-compatible and 3.x-compatible sdists to PyPI (with an > appropriate 'Programming Language :: Python :: 2' or '3' classifier) and > have the download tools be smart. > > Regards > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ubershmekel%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Thu Sep 15 20:27:22 2011 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 15 Sep 2011 20:27:22 +0200 Subject: [Python-Dev] Do we have interest in a clang buildbot? In-Reply-To: <4E71E197.7000006@jcea.es> References: <4E71E197.7000006@jcea.es> Message-ID: <20110915182722.GA12130@sleipnir.bytereef.org> Jesus Cea wrote: > I am seeing a few commits related to clang (a C compiler, alternative > to GCC), but we ?only? have a buildbot using clang as the compiler. > > If there is interest, I would deploy 32 and 64 bits buildbots under my > current OpenIndiana buildbot. I think it makes sense. clang has different warnings and the versions >= 2.9 apparently optimize extremely aggressively. Probably it would be most useful to run these bots with -O2 (and not --with-pydebug). Stefan Krah From fuzzyman at voidspace.org.uk Thu Sep 15 20:31:52 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 15 Sep 2011 19:31:52 +0100 Subject: [Python-Dev] Packaging in Python 2 anyone ? In-Reply-To: <4E72266F.106@netwok.org> References: <4E4D5992.7070603@netwok.org> <4E6F7D6B.9040709@netwok.org> <4E6F861F.5020904@voidspace.org.uk> <4E72266F.106@netwok.org> Message-ID: <4E724498.2040603@voidspace.org.uk> On 15/09/2011 17:23, ?ric Araujo wrote: > Le 13/09/2011 18:34, Michael Foord a ?crit : >> On 13/09/2011 16:57, ?ric Araujo wrote: >>> (IIRC PyPI will require us to play games to have both >>> 2.x and 3.x versions of distutils2.) >> What I'm doing for unittest2. >> [...] >> 2) I have a pypi project called unittestpy3k that holds the Python 3 >> version of unittest2 >> >> Projects using unittest2 for Python 3 then have a dependency on >> unittest2py3k - but the actual Python package name is unittest2. > That?s what I call playing games. I think it would make more sense to > push 2.x-compatible and 3.x-compatible sdists to PyPI (with an > appropriate 'Programming Language :: Python :: 2' or '3' classifier) and > have the download tools be smart. Hah, sure. In the meantime my way works *now* and with the existing tools. :-) (But only actually true for the way I make it available from pypi - the rest of the technique is not "playing games", right?) Yes, I would prefer to have a single project name with different distributions for Python 2 and 3 (and I looked into it) - but with the current tools the only way to achieve that is to put both versions into a single distribution. This prevents you from versioning them separately and is a pain to do anyway if the different versions are in different repos. The current tools are a real pain for versioning anyway. If your pypi page even *links* to a page that offers an alpha or beta (in development version) for download then both pip and easy_install will fetch that, in preference to the most recent version on pypi. So yes, I agree there is room for improvement in the current tools. Hopefully distutils2 will fix that. ;-) All the best, Michael Foord > > Regards > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From tjreedy at udel.edu Thu Sep 15 20:46:11 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 15 Sep 2011 14:46:11 -0400 Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings In-Reply-To: <4E721ED1.1000001@v.loewis.de> References: <4E721ED1.1000001@v.loewis.de> Message-ID: On 9/15/2011 11:50 AM, "Martin v. L?wis" wrote: > To comply with the C aliasing rules, the structures would look like this: > > typedef struct { > PyObject_HEAD > Py_ssize_t length; > union { > void *any; > Py_UCS1 *latin1; > Py_UCS2 *ucs2; > Py_UCS4 *ucs4; > } data; > Py_hash_t hash; > int state; /* may include SSTATE_SHORT_ASCII flag */ > wchar_t *wstr; > } PyASCIIObject; > > > typedef struct { > PyASCIIObject _base; > Py_ssize_t utf8_length; > char *utf8; > Py_ssize_t wstr_length; > } PyUnicodeObject; > > Code that directly accesses the structures would become more > complex; code that use the accessor macros wouldn't notice. ... > What do you think? That nearly all code outside CPython itself should treat the unicode types, especially, as opaque types and only access instances through functions and macros -- the 'public' interfaces. We need to be free to fiddle with internal implementation details as experience suggests changes. > P.S. There are similar reductions that could be applied > to the wstr_length in general: on 32-bit wchar_t systems, > it could be always dropped, on a 16-bit wchar_t system, > it could be dropped for UCS-2 strings. However, I'm not > proposing these, as I think the increase in complexity > is not worth the savings. I would certainly do just the one change now and see how it goes. I think you should be free to do more like the above if you change your mind with experience. -- Terry Jan Reedy From guido at python.org Thu Sep 15 21:48:01 2011 From: guido at python.org (Guido van Rossum) Date: Thu, 15 Sep 2011 12:48:01 -0700 Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings In-Reply-To: <4E721ED1.1000001@v.loewis.de> References: <4E721ED1.1000001@v.loewis.de> Message-ID: On Thu, Sep 15, 2011 at 8:50 AM, "Martin v. L?wis" wrote: > In reviewing memory usage, I found potential for saving more memory for > ASCII-only strings. Both Victor and Guido commented that something like > this be done; Antoine had asked whether there was anything that could > be done. Here is the idea: > > In an ASCII-only string, the UTF-8 representation is shared with the > canonical one-byte representation. This would allow to drop the > UTF-8 pointer and the UTF-8 length field; instead, a flag in the state > would indicate that these fields are not there. > > Likewise, the wchar_t/Py_UNICODE length can be shared (even though the > data cannot), since the ASCII-only string won't contain any surrogate > pairs. > > To comply with the C aliasing rules, the structures would look like this: > > typedef struct { > ? ?PyObject_HEAD > ? ?Py_ssize_t length; > ? ?union { > ? ? ? ?void *any; > ? ? ? ?Py_UCS1 *latin1; > ? ? ? ?Py_UCS2 *ucs2; > ? ? ? ?Py_UCS4 *ucs4; > ? ?} data; > ? ?Py_hash_t hash; > ? ?int state; ? ? /* may include SSTATE_SHORT_ASCII flag */ > ? ?wchar_t *wstr; > } PyASCIIObject; > > > typedef struct { > ? ?PyASCIIObject _base; > ? ?Py_ssize_t utf8_length; > ? ?char *utf8; > ? ?Py_ssize_t wstr_length; > } PyUnicodeObject; > > Code that directly accesses the structures would become more > complex; code that use the accessor macros wouldn't notice. > > As a result, ASCII-only strings would lose three pointers, > and shrink to their 3.2 structure size. Since they also save > in the individual characters, strings with more than > 3 characters (16-bit Py_UNICODE) or more than one character > (32-bit Py_UNICODE) would see a total size reduction compared > to 3.2. > > Objects created throught the legacy API (PyUnicode_FromUnicode) > that are only later found to be ASCII-only (in PyUnicode_Ready) > would still have the UTF-8 pointer shared with the data pointer, > but keep including separate fields for pointer & size. > > What do you think? > > Regards, > Martin > > P.S. There are similar reductions that could be applied > to the wstr_length in general: on 32-bit wchar_t systems, > it could be always dropped, on a 16-bit wchar_t system, > it could be dropped for UCS-2 strings. However, I'm not > proposing these, as I think the increase in complexity > is not worth the savings. This sounds like a good plan. -- --Guido van Rossum (python.org/~guido) From victor.stinner at haypocalc.com Thu Sep 15 23:04:16 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 15 Sep 2011 23:04:16 +0200 Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings In-Reply-To: <4E721ED1.1000001@v.loewis.de> References: <4E721ED1.1000001@v.loewis.de> Message-ID: <201109152304.16957.victor.stinner@haypocalc.com> Le jeudi 15 septembre 2011 17:50:41, Martin v. L?wis a ?crit : > In reviewing memory usage, I found potential for saving more memory for > ASCII-only strings. (...) > > typedef struct { > PyObject_HEAD > Py_ssize_t length; > union { > void *any; > Py_UCS1 *latin1; > Py_UCS2 *ucs2; > Py_UCS4 *ucs4; > } data; > Py_hash_t hash; > int state; /* may include SSTATE_SHORT_ASCII flag */ > wchar_t *wstr; > } PyASCIIObject; I like it. If we start which such optimization, we can also also remove data from strings allocated by the new API (it can be computed: object pointer + size of the structure). See my email for my proposition of structures: Re: [Python-Dev] PEP 393 review Thu Aug 25 00:29:19 2011 You may reorganize fields to be able to cast PyUnicodeObject to PyASCIIObject. Victor From martin at v.loewis.de Thu Sep 15 23:39:13 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 15 Sep 2011 23:39:13 +0200 Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings In-Reply-To: <201109152304.16957.victor.stinner@haypocalc.com> References: <4E721ED1.1000001@v.loewis.de> <201109152304.16957.victor.stinner@haypocalc.com> Message-ID: <4E727081.9010307@v.loewis.de> > I like it. If we start which such optimization, we can also also remove data > from strings allocated by the new API (it can be computed: object pointer + > size of the structure). See my email for my proposition of structures: > Re: [Python-Dev] PEP 393 review > Thu Aug 25 00:29:19 2011 I agree it is tempting to drop the data pointer. However, I'm not sure how many different structures we would end up with, and how the aliasing rules would defeat this (you cannot interpret a struct X* as a struct Y*, unless either X is the first field of Y or vice versa). Thinking about this, the following may work: - ASCIIObject: state, length, hash, wstr*, data follow - SingleBlockUnicode: ASCIIObject, wstr_len, utf8*, utf8_len, data follow - UnicodeObject: SingleBlockUnicode, data pointer, no data follow This is essentially your proposal, except that the wstr_len is dropped for ASCII strings, and that it uses nested structs. The single-block variants would always be "ready", the full unicode object is ready only if the data pointer is set. I'll try it out, unless somebody can punch a hole into this proposal :-) Regards, Martin From ncoghlan at gmail.com Fri Sep 16 00:42:25 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 16 Sep 2011 08:42:25 +1000 Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings In-Reply-To: <4E727081.9010307@v.loewis.de> References: <4E721ED1.1000001@v.loewis.de> <201109152304.16957.victor.stinner@haypocalc.com> <4E727081.9010307@v.loewis.de> Message-ID: On Fri, Sep 16, 2011 at 7:39 AM, "Martin v. L?wis" wrote: > Thinking about this, the following may work: > - ASCIIObject: state, length, hash, wstr*, data follow > - SingleBlockUnicode: ASCIIObject, wstr_len, > ? ? ? ? ? ? ? ? ? ? ?utf8*, utf8_len, data follow > - UnicodeObject: SingleBlockUnicode, data pointer, no data follow > > This is essentially your proposal, except that the wstr_len is dropped for > ASCII strings, and that it uses nested structs. > > The single-block variants would always be "ready", the full unicode object > is ready only if the data pointer is set. In your "UnicodeObject" here, is the 'data pointer' the any/latin1/ucs2/ucs4 union from the original structure definition? Also, what are the constraints on the "SingleBlockUnicode"? Does it only hold strings that can be represented in latin1? Or can the size of the individual elements be more than 1 byte? Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From albzey at googlemail.com Fri Sep 16 00:44:35 2011 From: albzey at googlemail.com (Albert Zeyer) Date: Fri, 16 Sep 2011 00:44:35 +0200 Subject: [Python-Dev] Meta coding in Python Message-ID: Hi list, I thought it would be nice in Python to allow some sort of meta coding (which goes far ahead of simple function descriptors). The most straight forward way would be to allow operations on the AST. I wrote a small patch for CPython 2.7.1 which, for each code object, adds the related AST of the statement to a new attribute `co_ast`. https://github.com/albertz/CPython/commit/2670e621458fd80311fc02897b698ea2a36d494b Some simple demonstration of what you can do with this: https://github.com/albertz/CPython/blob/astcompile_patch/test_co_ast.py I'm not sure wether the Python AST in this form is optimal for doing such things, though. Maybe another representation would be more efficient and result in simpler code for doing transformations. Discussion about this is very welcome. Regards, Albert From benjamin at python.org Fri Sep 16 00:57:12 2011 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 15 Sep 2011 18:57:12 -0400 Subject: [Python-Dev] Meta coding in Python In-Reply-To: References: Message-ID: 2011/9/15 Albert Zeyer : > Hi list, > > I thought it would be nice in Python to allow some sort of meta coding > (which goes far ahead of simple function descriptors). > > The most straight forward way would be to allow operations on the AST. > > I wrote a small patch for CPython 2.7.1 which, for each code object, > adds the related AST of the statement to a new attribute `co_ast`. > > https://github.com/albertz/CPython/commit/2670e621458fd80311fc02897b698ea2a36d494b > > Some simple demonstration of what you can do with this: > > https://github.com/albertz/CPython/blob/astcompile_patch/test_co_ast.py > > I'm not sure wether the Python AST in this form is optimal for doing > such things, though. Maybe another representation would be more > efficient and result in simpler code for doing transformations. It would be useful, but is a waste of memory is 99.99% of programs. -- Regards, Benjamin From ncoghlan at gmail.com Fri Sep 16 01:12:20 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 16 Sep 2011 09:12:20 +1000 Subject: [Python-Dev] Meta coding in Python In-Reply-To: References: Message-ID: On Fri, Sep 16, 2011 at 8:44 AM, Albert Zeyer wrote: > Hi list, > > I thought it would be nice in Python to allow some sort of meta coding > (which goes far ahead of simple function descriptors). > > The most straight forward way would be to allow operations on the AST. 1. This kind of suggestion is more appropriately directed to python-ideas 2. We already support this, look at the ast module and in particular the ast.PyCF_ONLY_AST flag to the compile() builtin function. For an example of advanced usage, look at the py.test module and it's meta-importer that rewrites assert statements Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From martin at v.loewis.de Fri Sep 16 07:41:21 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 16 Sep 2011 07:41:21 +0200 Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings In-Reply-To: References: <4E721ED1.1000001@v.loewis.de> <201109152304.16957.victor.stinner@haypocalc.com> <4E727081.9010307@v.loewis.de> Message-ID: <4E72E181.7060805@v.loewis.de> Am 16.09.11 00:42, schrieb Nick Coghlan: > On Fri, Sep 16, 2011 at 7:39 AM, "Martin v. L?wis > wrote: >> Thinking about this, the following may work: >> >> - ASCIIObject: state, length, hash, wstr*, data follow >> >> - SingleBlockUnicode: ASCIIObject, wstr_len, utf8*, utf8_len, data >> follow >> >> - UnicodeObject: SingleBlockUnicode, data pointer, no data follow >> >> This is essentially your proposal, except that the wstr_len is >> dropped for ASCII strings, and that it uses nested structs. >> >> The single-block variants would always be "ready", the full unicode >> object is ready only if the data pointer is set. > > In your "UnicodeObject" here, is the 'data pointer' the > any/latin1/ucs2/ucs4 union from the original structure definition? Yes, it is. I'm considering dropping the union again, since you'll have to cast the data pointer anyway in the compact cases. > Also, what are the constraints on the "SingleBlockUnicode"? Does it > only hold strings that can be represented in latin1? Or can the size > of the individual elements be more than 1 byte? Any size - what matters is whether the maximum character is known at creation time (i.e. whether you've used PyUnicode_New(size, maxchar) or PyUnicode_FromUnicode(NULL, size)). In the latter case, a Py_UNICODE block will be allocated in wstr, and the data pointer left NULL. Then, when PyUnicode_Ready is called, the maxmimum character is determined in the Py_UNICODE block, and a new data block allocated - but that will have to be a second memory block (the Py_UNICODE block is then dropped in _Ready). Regards, Martin From status at bugs.python.org Fri Sep 16 18:07:28 2011 From: status at bugs.python.org (Python tracker) Date: Fri, 16 Sep 2011 18:07:28 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20110916160728.38D731CC86@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-09-09 - 2011-09-16) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3019 (+19) closed 21757 (+30) total 24776 (+49) Open issues with patches: 1295 Issues opened (36) ================== #12946: PyModule_GetDict() claims it can never fail, but it can http://bugs.python.org/issue12946 opened by scoder #12949: Documentation of PyCode_New() lacks kwonlyargcount argument http://bugs.python.org/issue12949 opened by scoder #12953: Function calls missing from profiler output http://bugs.python.org/issue12953 opened by hagen #12954: Multiprocessing logging under Windows http://bugs.python.org/issue12954 opened by paul.j3 #12955: urllib2.build_opener().open() is not friendly to "with ... as: http://bugs.python.org/issue12955 opened by Valery.Khamenya #12956: builds fail when installing to --prefix with space in path nam http://bugs.python.org/issue12956 opened by rzn8tr #12957: mmap.resize changes memory address of mmap'd region http://bugs.python.org/issue12957 opened by schmichael #12958: test_socket failures on Mac OS X http://bugs.python.org/issue12958 opened by ncoghlan #12960: threading.Condition is not a class http://bugs.python.org/issue12960 opened by Nikratio #12961: unlabelled balls in boxes http://bugs.python.org/issue12961 opened by Phillip.M.Feldman #12962: TitledHelpFormatter and IndentedHelpFormatter are not document http://bugs.python.org/issue12962 opened by techtonik #12964: Two improvements for the locale aliasing engine http://bugs.python.org/issue12964 opened by ssegvic #12965: longobject: documentation improvements http://bugs.python.org/issue12965 opened by skrah #12966: cookielib.LWPCookieJar breaks on cookie values with a newline http://bugs.python.org/issue12966 opened by paulie4 #12967: AttributeError distutils\log.py http://bugs.python.org/issue12967 opened by Ben.thelen #12970: os.walk() consider some symlinks as dirs instead of non-dirs http://bugs.python.org/issue12970 opened by mmarkk #12971: os.isdir() should contain skiplinks=False in arguments http://bugs.python.org/issue12971 opened by mmarkk #12972: Color prompt + readline http://bugs.python.org/issue12972 opened by atagar1 #12973: int_pow() implementation is incorrect http://bugs.python.org/issue12973 opened by adam at NetBSD.org #12974: array module: deprecate '__int__' conversion support for array http://bugs.python.org/issue12974 opened by meadori #12976: select module: only use EVFILT_TIMER if available (kqueue back http://bugs.python.org/issue12976 opened by bsiegert #12977: socket.socket.setblocking does not raise exception if no data http://bugs.python.org/issue12977 opened by Florian.Ludwig #12978: Figure out extended attributes on BSDs http://bugs.python.org/issue12978 opened by benjamin.peterson #12979: tkinter.font.Font object not usable as font option http://bugs.python.org/issue12979 opened by ilikepython #12981: rewrite multiprocessing (senfd|recvfd) in Python http://bugs.python.org/issue12981 opened by neologix #12982: .pyo file cannot be imported http://bugs.python.org/issue12982 opened by lebigot #12983: byte string literals with invalid hex escape codes raise Value http://bugs.python.org/issue12983 opened by ned.deily #12984: XML NamedNodeMap ( attribName in NamedNodeMap fails ) http://bugs.python.org/issue12984 opened by spolematt #12985: Check signed arithmetic overflow in ./configure http://bugs.python.org/issue12985 opened by skrah #12986: Using getrandbits() in uuid.uuid4() is faster and more readabl http://bugs.python.org/issue12986 opened by mattchaput #12987: Demo/scripts/newslist.py has non-commercial license clause http://bugs.python.org/issue12987 opened by matejcik #12988: IDLE on Win7 crashes when saving to Documents Library http://bugs.python.org/issue12988 opened by Brian.Gernhardt #12989: Consistently handle path separator in Py_GetPath on Windows http://bugs.python.org/issue12989 opened by Nam.Nguyen #12990: launcher can't work on path including tradition chinese char http://bugs.python.org/issue12990 opened by Ricky.Teng #12993: prepared statements in sqlite3 module http://bugs.python.org/issue12993 opened by Mayur.&.Angela.Patel-Lam #12994: cx_Oracle failed to load in newly build python 2.7.1 http://bugs.python.org/issue12994 opened by wah meng Most recent 15 issues with no replies (15) ========================================== #12994: cx_Oracle failed to load in newly build python 2.7.1 http://bugs.python.org/issue12994 #12993: prepared statements in sqlite3 module http://bugs.python.org/issue12993 #12990: launcher can't work on path including tradition chinese char http://bugs.python.org/issue12990 #12989: Consistently handle path separator in Py_GetPath on Windows http://bugs.python.org/issue12989 #12988: IDLE on Win7 crashes when saving to Documents Library http://bugs.python.org/issue12988 #12987: Demo/scripts/newslist.py has non-commercial license clause http://bugs.python.org/issue12987 #12986: Using getrandbits() in uuid.uuid4() is faster and more readabl http://bugs.python.org/issue12986 #12984: XML NamedNodeMap ( attribName in NamedNodeMap fails ) http://bugs.python.org/issue12984 #12983: byte string literals with invalid hex escape codes raise Value http://bugs.python.org/issue12983 #12979: tkinter.font.Font object not usable as font option http://bugs.python.org/issue12979 #12977: socket.socket.setblocking does not raise exception if no data http://bugs.python.org/issue12977 #12972: Color prompt + readline http://bugs.python.org/issue12972 #12971: os.isdir() should contain skiplinks=False in arguments http://bugs.python.org/issue12971 #12966: cookielib.LWPCookieJar breaks on cookie values with a newline http://bugs.python.org/issue12966 #12965: longobject: documentation improvements http://bugs.python.org/issue12965 Most recent 15 issues waiting for review (15) ============================================= #12989: Consistently handle path separator in Py_GetPath on Windows http://bugs.python.org/issue12989 #12986: Using getrandbits() in uuid.uuid4() is faster and more readabl http://bugs.python.org/issue12986 #12985: Check signed arithmetic overflow in ./configure http://bugs.python.org/issue12985 #12981: rewrite multiprocessing (senfd|recvfd) in Python http://bugs.python.org/issue12981 #12973: int_pow() implementation is incorrect http://bugs.python.org/issue12973 #12970: os.walk() consider some symlinks as dirs instead of non-dirs http://bugs.python.org/issue12970 #12965: longobject: documentation improvements http://bugs.python.org/issue12965 #12943: tokenize: add python -m tokenize support back http://bugs.python.org/issue12943 #12936: armv5tejl segfaults: sched_setaffinity() vs. pthread_setaffini http://bugs.python.org/issue12936 #12931: xmlrpclib confuses unicode and string http://bugs.python.org/issue12931 #12930: reindent.py inserts spaces in multiline literals http://bugs.python.org/issue12930 #12919: Control what module is imported first http://bugs.python.org/issue12919 #12911: Expose a private accumulator C API http://bugs.python.org/issue12911 #12903: test_io.test_interrupte[r]d* blocks on OpenBSD http://bugs.python.org/issue12903 #12901: Nest class/methods directives in documentation http://bugs.python.org/issue12901 Top 10 most discussed issues (10) ================================= #12936: armv5tejl segfaults: sched_setaffinity() vs. pthread_setaffini http://bugs.python.org/issue12936 26 msgs #11457: Expose nanosecond precision from system calls http://bugs.python.org/issue11457 17 msgs #12973: int_pow() implementation is incorrect http://bugs.python.org/issue12973 16 msgs #1172711: long long support for array module http://bugs.python.org/issue1172711 11 msgs #8822: datetime naive and aware types should have a well-defined defi http://bugs.python.org/issue8822 10 msgs #12301: Use :role:`sys.thing` instead of ``sys.thing`` throughout http://bugs.python.org/issue12301 7 msgs #12945: ctypes works incorrectly with _swappedbytes_ = 1 http://bugs.python.org/issue12945 7 msgs #6715: xz compressor support http://bugs.python.org/issue6715 6 msgs #12913: Add a debugging howto http://bugs.python.org/issue12913 6 msgs #12981: rewrite multiprocessing (senfd|recvfd) in Python http://bugs.python.org/issue12981 6 msgs Issues closed (27) ================== #7201: double Endian problem and more on arm http://bugs.python.org/issue7201 closed by mark.dickinson #9871: IDLE 3 crashes processing byte strings with invalid hex escape http://bugs.python.org/issue9871 closed by ned.deily #11149: [PATCH] Configure should enable -fwrapv for clang http://bugs.python.org/issue11149 closed by skrah #12299: Stop documenting functions added by site as builtins http://bugs.python.org/issue12299 closed by eric.araujo #12306: zlib: Expose zlibVersion to query runtime version of zlib http://bugs.python.org/issue12306 closed by nadeem.vawda #12483: CThunkObject_dealloc should call PyObject_GC_UnTrack? http://bugs.python.org/issue12483 closed by meadori #12896: Recommended location of the interpreter for Python 3 http://bugs.python.org/issue12896 closed by eric.araujo #12914: Add cram function to textwrap http://bugs.python.org/issue12914 closed by rhettinger #12917: Make visiblename and allmethods functions public http://bugs.python.org/issue12917 closed by rhettinger #12918: New module for terminal utilities http://bugs.python.org/issue12918 closed by eric.araujo #12924: Missing call to quote_plus() in test_urllib.test_default_quoti http://bugs.python.org/issue12924 closed by python-dev #12935: Typo in findertools.py http://bugs.python.org/issue12935 closed by ned.deily #12940: Cmd example using turtle left vs. right doc-bug http://bugs.python.org/issue12940 closed by ezio.melotti #12941: add random.pop() http://bugs.python.org/issue12941 closed by terry.reedy #12947: Examples in library/doctest.html lack the flags http://bugs.python.org/issue12947 closed by eric.araujo #12948: multiprocessing test failures can hang the buildbots http://bugs.python.org/issue12948 closed by jcea #12950: multiprocessing "test_fd_transfer" fails under OpenIndiana http://bugs.python.org/issue12950 closed by python-dev #12951: List behavior is different http://bugs.python.org/issue12951 closed by ezio.melotti #12952: Solaris/Illumos (OpenIndiana) Scheduling policies http://bugs.python.org/issue12952 closed by python-dev #12959: Add 'ChainMap' to collections.__all__ http://bugs.python.org/issue12959 closed by python-dev #12963: PyLong_AsSize_t returns (unsigned long)-1 http://bugs.python.org/issue12963 closed by skrah #12968: vvccc??????????????? http://bugs.python.org/issue12968 closed by benjamin.peterson #12969: Command 'open(0,"wb").close()' cause crash of Python interpret http://bugs.python.org/issue12969 closed by jcea #12975: spam http://bugs.python.org/issue12975 closed by neologix #12980: segfault in test_json on AMD64 FreeBSD 8.2 2.7 http://bugs.python.org/issue12980 closed by skrah #12991: Python 64-bit build on HP Itanium - Executable built successfu http://bugs.python.org/issue12991 closed by skrah #12992: Python build finished, but the necessary bits to build these m http://bugs.python.org/issue12992 closed by ezio.melotti From chris at simplistix.co.uk Fri Sep 16 19:58:46 2011 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 16 Sep 2011 18:58:46 +0100 Subject: [Python-Dev] Packaging in Python 2 anyone ? In-Reply-To: <4E724498.2040603@voidspace.org.uk> References: <4E4D5992.7070603@netwok.org> <4E6F7D6B.9040709@netwok.org> <4E6F861F.5020904@voidspace.org.uk> <4E72266F.106@netwok.org> <4E724498.2040603@voidspace.org.uk> Message-ID: <4E738E56.20709@simplistix.co.uk> On 15/09/2011 19:31, Michael Foord wrote: > The current tools are a real pain for versioning anyway. If your pypi > page even *links* to a page that offers an alpha or beta (in development > version) for download then both pip and easy_install will fetch that, in > preference to the most recent version on pypi. So yes, I agree there is > room for improvement in the current tools. Hopefully distutils2 will fix > that. ;-) I'm pretty sure recent releases of zc.buildout prefer "final" releases by default ;-) Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From albzey at googlemail.com Sat Sep 17 17:05:48 2011 From: albzey at googlemail.com (Albert Zeyer) Date: Sat, 17 Sep 2011 17:05:48 +0200 Subject: [Python-Dev] Persistent Python - a la Smalltalk Message-ID: Hi, I was thinking about a persistent Python interpreter system. I.e. you start a Python interpreter instance and you load and create all your objects, classes and code in there (or load it in there from other files). The basic idea is that you wont restart your Python script, you would always modify it on-the-fly. Or a bit less extreme: You would at least have the possibility with this to do this (like just doing minor changes). Also, if your PC halts for whatever reason, you can continue your Python script after a restart. This goes along my other recent proposal to store the AST of statements in the related code objects (http://thread.gmane.org/gmane.comp.python.devel/126754). An internal editor could then edit this AST and recompile the code object. For the persistance, there would be an image file containing all the Python objects. All in all, much like most Smalltalk systems. --- Has anyone done something like this already? --- There are a few implementation details which are not trivial and there doesn't seem to be straight forward solutions, e.g. most generally: * How to implement the persistance? * How to handle image compatibility between CPython updates? Even possible? Regards, Albert From guido at python.org Sat Sep 17 17:17:38 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 17 Sep 2011 08:17:38 -0700 Subject: [Python-Dev] Persistent Python - a la Smalltalk In-Reply-To: References: Message-ID: [BCC python-dev, +python-ideas] Funny you should mention this. ABC, Python's predecessor, worked like this. However, it didn't work out very well. So, I'd say you're about 30 years too late with your idea... :-( --Guido On Sat, Sep 17, 2011 at 8:05 AM, Albert Zeyer wrote: > Hi, > > I was thinking about a persistent Python interpreter system. I.e. you > start a Python interpreter instance and you load and create all your > objects, classes and code in there (or load it in there from other > files). > > The basic idea is that you wont restart your Python script, you would > always modify it on-the-fly. Or a bit less extreme: You would at least > have the possibility with this to do this (like just doing minor > changes). Also, if your PC halts for whatever reason, you can continue > your Python script after a restart. > > This goes along my other recent proposal to store the AST of > statements in the related code objects > (http://thread.gmane.org/gmane.comp.python.devel/126754). An internal > editor could then edit this AST and recompile the code object. > > For the persistance, there would be an image file containing all the > Python objects. > > All in all, much like most Smalltalk systems. > > --- > > Has anyone done something like this already? > > --- > > There are a few implementation details which are not trivial and there > doesn't seem to be straight forward solutions, e.g. most generally: > > * How to implement the persistance? > * How to handle image compatibility between CPython updates? Even possible? > > Regards, > Albert > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Sat Sep 17 19:01:30 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 17 Sep 2011 10:01:30 -0700 Subject: [Python-Dev] Persistent Python - a la Smalltalk In-Reply-To: References: Message-ID: <4E74D26A.4030309@stoneleaf.us> Albert Zeyer wrote: > I was thinking about a persistent Python interpreter system. python-dev is for developing the next version of Python (3.3 at this point). Questions like this should go to python-list or python-ideas. ~Ethan~ From godson.g at gmail.com Sun Sep 18 10:55:25 2011 From: godson.g at gmail.com (Godson Gera) Date: Sun, 18 Sep 2011 14:25:25 +0530 Subject: [Python-Dev] Persistent Python - a la Smalltalk In-Reply-To: References: Message-ID: Twisted has some feature like that implemented using pickles or some thing. It meant to save the state of the program during restart. I am not sure if that's what you are after. http://twistedmatrix.com On 17 Sep 2011 20:44, "Albert Zeyer" wrote: > Hi, > > I was thinking about a persistent Python interpreter system. I.e. you > start a Python interpreter instance and you load and create all your > objects, classes and code in there (or load it in there from other > files). > > The basic idea is that you wont restart your Python script, you would > always modify it on-the-fly. Or a bit less extreme: You would at least > have the possibility with this to do this (like just doing minor > changes). Also, if your PC halts for whatever reason, you can continue > your Python script after a restart. > > This goes along my other recent proposal to store the AST of > statements in the related code objects > (http://thread.gmane.org/gmane.comp.python.devel/126754). An internal > editor could then edit this AST and recompile the code object. > > For the persistance, there would be an image file containing all the > Python objects. > > All in all, much like most Smalltalk systems. > > --- > > Has anyone done something like this already? > > --- > > There are a few implementation details which are not trivial and there > doesn't seem to be straight forward solutions, e.g. most generally: > > * How to implement the persistance? > * How to handle image compatibility between CPython updates? Even possible? > > Regards, > Albert > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/godson.g%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From zbyszek at in.waw.pl Sun Sep 18 10:23:25 2011 From: zbyszek at in.waw.pl (Zbigniew =?UTF-8?B?SsSZZHJ6ZWpld3NraS1Tem1law==?=) Date: Sun, 18 Sep 2011 10:23:25 +0200 Subject: [Python-Dev] Persistent Python - a la Smalltalk References: Message-ID: Guido van Rossum wrote: > [BCC python-dev, +python-ideas] > > Funny you should mention this. ABC, Python's predecessor, worked like > this. However, it didn't work out very well. So, I'd say you're about > 30 years too late with your idea... :-( Well, the newly developed IPython notebook [1] is something along those lines. So he's not late, he's a little bit early :) -- Zbyszek http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html From smiwa.egon at googlemail.com Tue Sep 20 15:58:35 2011 From: smiwa.egon at googlemail.com (Egon Smiwa) Date: Tue, 20 Sep 2011 15:58:35 +0200 Subject: [Python-Dev] Unicode identifiers Message-ID: <4E789C0B.7060709@googlemail.com> Hi all, I wanted to implement quantity objects in a software, which can be used with user-friendly expressions like: money = 3 * ?, where Euro is a special quantity object But now I realized, Python does not allow currency characters in names, although they can be very useful. Is there a really convincing argument against the inclusion? Thank you! From benjamin at python.org Tue Sep 20 16:01:11 2011 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 20 Sep 2011 10:01:11 -0400 Subject: [Python-Dev] Unicode identifiers In-Reply-To: <4E789C0B.7060709@googlemail.com> References: <4E789C0B.7060709@googlemail.com> Message-ID: 2011/9/20 Egon Smiwa : > Hi all, > I wanted to implement quantity objects in a software, > which can be used with user-friendly expressions like: > money = 3 * ?, where Euro is a special quantity object > But now I realized, Python does not allow currency > characters in names, although they can be very useful. > Is there a really convincing argument against the inclusion? It's a violation of http://unicode.org/reports/tr31/ -- Regards, Benjamin From stefan at bytereef.org Wed Sep 21 18:02:27 2011 From: stefan at bytereef.org (Stefan Krah) Date: Wed, 21 Sep 2011 18:02:27 +0200 Subject: [Python-Dev] [Python-checkins] cpython: Issue #1172711: Add 'long long' support to the array module. In-Reply-To: <4E79E5C9.9080204@gmail.com> References: <4E79E5C9.9080204@gmail.com> Message-ID: <20110921160227.GA19702@sleipnir.bytereef.org> Ezio Melotti wrote: >> + at unittest.skipIf(not have_long_long, 'need long long support') > > I think this would read better with skipUnless and s/have/has/: > > @unittest.skipUnless(HAS_LONG_LONG, 'need long long support') skipUnless() is perhaps a bit cleaner, but have_long_long is pretty established elsewhere (for example in pyport.h). Stefan Krah From g.brandl at gmx.net Wed Sep 21 18:16:58 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 21 Sep 2011 18:16:58 +0200 Subject: [Python-Dev] cpython: Issue #1172711: Add 'long long' support to the array module. In-Reply-To: <4E79E5C9.9080204@gmail.com> References: <4E79E5C9.9080204@gmail.com> Message-ID: Am 21.09.2011 15:25, schrieb Ezio Melotti: >> @@ -1205,6 +1214,18 @@ >> minitemsize = 4 >> tests.append(UnsignedLongTest) >> >> + at unittest.skipIf(not have_long_long, 'need long long support') > > I think this would read better with skipUnless and s/have/has/: > > @unittest.skipUnless(HAS_LONG_LONG, 'need long long support') I don't think so. "skip if not" reads pretty well for me, while I always have to think twice about "unless" -- may be a non-native- speaker thing. Georg From benjamin at python.org Wed Sep 21 18:21:53 2011 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 21 Sep 2011 12:21:53 -0400 Subject: [Python-Dev] cpython: Issue #1172711: Add 'long long' support to the array module. In-Reply-To: References: <4E79E5C9.9080204@gmail.com> Message-ID: 2011/9/21 Georg Brandl : > Am 21.09.2011 15:25, schrieb Ezio Melotti: > >>> @@ -1205,6 +1214,18 @@ >>> ? ? ? minitemsize = 4 >>> ? tests.append(UnsignedLongTest) >>> >>> + at unittest.skipIf(not have_long_long, 'need long long support') >> >> I think this would read better with skipUnless and s/have/has/: >> >> @unittest.skipUnless(HAS_LONG_LONG, 'need long long support') > > I don't think so. "skip if not" reads pretty well for me, while I > always have to think twice about "unless" -- may be a non-native- > speaker thing. You might also not program in Ruby enough. :) -- Regards, Benjamin From meadori at gmail.com Wed Sep 21 18:40:55 2011 From: meadori at gmail.com (Meador Inge) Date: Wed, 21 Sep 2011 11:40:55 -0500 Subject: [Python-Dev] [Python-checkins] cpython: Issue #1172711: Add 'long long' support to the array module. In-Reply-To: <20110921160227.GA19702@sleipnir.bytereef.org> References: <4E79E5C9.9080204@gmail.com> <20110921160227.GA19702@sleipnir.bytereef.org> Message-ID: On Wed, Sep 21, 2011 at 11:02 AM, Stefan Krah wrote: > Ezio Melotti wrote: >>> + at unittest.skipIf(not have_long_long, 'need long long support') >> >> I think this would read better with skipUnless and s/have/has/: >> >> @unittest.skipUnless(HAS_LONG_LONG, 'need long long support') > > skipUnless() is perhaps a bit cleaner, but have_long_long is pretty > established elsewhere (for example in pyport.h). I agree with Stefan on the have_long_long part. This is what is used in the array module code, struct, ctypes, etc ... (via pyport.h as Stefan mentioned). As for the unless/if, I am OK with the 'if'. 'unless' always causes a double-take for me. Personal preference I guess. -- Meador From merwok at netwok.org Wed Sep 21 18:50:25 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Wed, 21 Sep 2011 18:50:25 +0200 Subject: [Python-Dev] Packaging in Python 2 anyone ? In-Reply-To: <4E6F7D6B.9040709@netwok.org> References: <4E4D5992.7070603@netwok.org> <4E6F7D6B.9040709@netwok.org> Message-ID: <4E7A15D1.2090402@netwok.org> Hi, I caught Tarek on IRC and forced him to answer my questions. Here are the latest news: - I have cleaned up and synchronized the distutils2 codebase with packaging in 3.3. All features and bugs are now identical. The test suite runs with Python 2.4 to 2.7; there are three or four test failures (linux, with threads, UCS4, not shared). Please clone, build (we backported hashlib for 2.4), test and file bugs! We?ll make an alpha4 as soon as all tests pass. - I have started work in a named branch to provide distutils2 for Python 3.1 and 3.2. Patches will flow between packaging, distutils2 and distutils2-py3. I?ll start a discussion on catalog-sig to improve support of parallel releases of 2.x and 3.x-compatible projects. - The docs in the d2 repo will be removed; people will go to docs.python.org and mentally convert packaging to distutils2. I?ll update the PyPI page. Cheers From stephen at xemacs.org Wed Sep 21 19:02:11 2011 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 22 Sep 2011 02:02:11 +0900 Subject: [Python-Dev] cpython: Issue #1172711: Add 'long long' support to the array module. In-Reply-To: References: <4E79E5C9.9080204@gmail.com> Message-ID: <87pqitbzho.fsf@uwakimon.sk.tsukuba.ac.jp> Georg Brandl writes: > I don't think so. "skip if not" reads pretty well for me, while I > always have to think twice about "unless" -- may be a non-native- > speaker thing. FWIW, speaking as one native speaker, I'm not sure about that. "do ... if not condition" doesn't bother me, whether I think of the condition as an exception or as the normal state of affairs. I find "do ... unless condition" to be quite awkward if the condition is a normal state. From fuzzyman at voidspace.org.uk Wed Sep 21 20:08:22 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 21 Sep 2011 19:08:22 +0100 Subject: [Python-Dev] cpython: Issue #1172711: Add 'long long' support to the array module. In-Reply-To: <87pqitbzho.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4E79E5C9.9080204@gmail.com> <87pqitbzho.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4E7A2816.8080204@voidspace.org.uk> On 21/09/2011 18:02, Stephen J. Turnbull wrote: > Georg Brandl writes: > > > I don't think so. "skip if not" reads pretty well for me, while I > > always have to think twice about "unless" -- may be a non-native- > > speaker thing. > > FWIW, speaking as one native speaker, I'm not sure about that. "do ... > if not condition" doesn't bother me, whether I think of the condition > as an exception or as the normal state of affairs. I find "do ... > unless condition" to be quite awkward if the condition is a normal state. I'm not a big fan of skipUnless, but there you go. I find "skip if not" readable too and always have to "work out" what skipUnless means. It's probably just that "if" and "if not" are such Python idioms and "unless" isn't. Michael > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From ezio.melotti at gmail.com Wed Sep 21 22:43:58 2011 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Wed, 21 Sep 2011 23:43:58 +0300 Subject: [Python-Dev] cpython: Issue #1172711: Add 'long long' support to the array module. In-Reply-To: <4E7A2816.8080204@voidspace.org.uk> References: <4E79E5C9.9080204@gmail.com> <87pqitbzho.fsf@uwakimon.sk.tsukuba.ac.jp> <4E7A2816.8080204@voidspace.org.uk> Message-ID: <4E7A4C8E.3090802@gmail.com> On 21/09/2011 21.08, Michael Foord wrote: > On 21/09/2011 18:02, Stephen J. Turnbull wrote: >> Georg Brandl writes: >> >> > I don't think so. "skip if not" reads pretty well for me, while I >> > always have to think twice about "unless" -- may be a non-native- >> > speaker thing. >> >> FWIW, speaking as one native speaker, I'm not sure about that. "do ... >> if not condition" doesn't bother me, whether I think of the condition >> as an exception or as the normal state of affairs. I find "do ... >> unless condition" to be quite awkward if the condition is a normal >> state. > > I'm not a big fan of skipUnless, but there you go. I find "skip if > not" readable too and always have to "work out" what skipUnless means. > It's probably just that "if" and "if not" are such Python idioms and > "unless" isn't. I don't find it too readable in other contexts (e.g. failUnless), but I probably got used to skipUnless with the idiom: try: import foo except ImportError: foo = None @skipUnless(foo, 'requires foo') ... FWIW in Lib/test/support.py we have a "skip_unless_symlink", but the other two skipUnless have more readable names: "requires_zlib" and "requires_IEEE_754". In Lib/test/ "skipUnless" is used about 250 times, "skipIf" about 100. Best Regards, Ezio Melotti > > Michael > > From merwok at netwok.org Fri Sep 23 16:35:27 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Fri, 23 Sep 2011 16:35:27 +0200 Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #12931: xmlrpclib now encodes Unicode URI to ISO-8859-1, instead of In-Reply-To: References: Message-ID: <4E7C992F.5030601@netwok.org> Hi Victor, > summary: > Issue #12931: xmlrpclib now encodes Unicode URI to ISO-8859-1, instead of > failing with a UnicodeDecodeError. > > diff --git a/Lib/test/test_xmlrpc.py b/Lib/test/test_xmlrpc.py > --- a/Lib/test/test_xmlrpc.py > +++ b/Lib/test/test_xmlrpc.py > @@ -472,6 +472,9 @@ > # protocol error; provide additional information in test output > self.fail("%s\n%s" % (e, getattr(e, "headers", ""))) > > + def test_unicode_host(self): > + server = xmlrpclib.ServerProxy(u"http://%s:%d/RPC2"%(ADDR, PORT)) Spaces around the modulo operator would have been nice here. Readability counts :) From le.mognon at gmail.com Fri Sep 23 17:12:53 2011 From: le.mognon at gmail.com (Martin Goudreau) Date: Fri, 23 Sep 2011 11:12:53 -0400 Subject: [Python-Dev] genious hack in python Message-ID: Hello Dev Teem, Guido told me to send you this idea... Improving productivity is one of my Strength. Please check a very small module i'v made for improving the debugger traceback. See the pybettererror.py on sourceforge: http://pybettererror.sourceforge.net/projet.html It's hard to find something to complain about in python. This one was a too good idea to keep for myself. Thanks Martin Goudreau from Montreal -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Fri Sep 23 17:54:51 2011 From: phd at phdru.name (Oleg Broytman) Date: Fri, 23 Sep 2011 19:54:51 +0400 Subject: [Python-Dev] genious hack in python In-Reply-To: References: Message-ID: <20110923155451.GC21909@iskra.aviel.ru> Hi! On Fri, Sep 23, 2011 at 11:12:53AM -0400, Martin Goudreau wrote: > Please check a very small > module i'v made for improving the debugger traceback. See the > pybettererror.py on sourceforge: > http://pybettererror.sourceforge.net/projet.html Why do this in sys.stderr and not by monkey-patching traceback.py, probably format_list and format_exception_only? Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From status at bugs.python.org Fri Sep 23 18:07:28 2011 From: status at bugs.python.org (Python tracker) Date: Fri, 23 Sep 2011 18:07:28 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20110923160728.E68E11CA8F@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-09-16 - 2011-09-23) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3030 (+11) closed 21788 (+31) total 24818 (+42) Open issues with patches: 1299 Issues opened (34) ================== #11686: Update of some email/ __all__ lists http://bugs.python.org/issue11686 reopened by r.david.murray #11780: email.encoders are broken http://bugs.python.org/issue11780 reopened by r.david.murray #12991: Python 64-bit build on HP Itanium - Executable built successfu http://bugs.python.org/issue12991 reopened by wah meng #12997: sqlite3: PRAGMA foreign_keys = ON doesn't work http://bugs.python.org/issue12997 opened by Mark.Bucciarelli #12998: Memory leak with CTypes Structure http://bugs.python.org/issue12998 opened by a01 #12999: _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED usage on Solaris http://bugs.python.org/issue12999 opened by neologix #13000: unhandled exception at install http://bugs.python.org/issue13000 opened by jorge.seifert #13001: test_socket.testRecvmsgTrunc failure on FreeBSD 7.2 buildbot http://bugs.python.org/issue13001 opened by neologix #13004: pprint: add option to truncate sequences http://bugs.python.org/issue13004 opened by terry.reedy #13008: syntax error when pasting valid snippet into console without e http://bugs.python.org/issue13008 opened by techtonik #13009: Remove documentation in distutils2 repo http://bugs.python.org/issue13009 opened by eric.araujo #13011: Frozen programs require the original build directory in order http://bugs.python.org/issue13011 opened by malcolmp #13012: Allow keyword argument in str.splitlines() http://bugs.python.org/issue13012 opened by mark.dickinson #13013: _ctypes.c: refleak http://bugs.python.org/issue13013 opened by Suman.Saha #13014: _ssl.c: refleak http://bugs.python.org/issue13014 opened by Suman.Saha #13015: _collectionsmodule.c: refleak http://bugs.python.org/issue13015 opened by Suman.Saha #13016: selectmodule.c: refleak http://bugs.python.org/issue13016 opened by Suman.Saha #13017: pyexpat.c: refleak http://bugs.python.org/issue13017 opened by Suman.Saha #13018: dictobject.c: refleak http://bugs.python.org/issue13018 opened by Suman.Saha #13019: bytearrayobject.c: refleak http://bugs.python.org/issue13019 opened by Suman.Saha #13020: structseq.c: refleak http://bugs.python.org/issue13020 opened by Suman.Saha #13023: argparse should allow displaying argument default values in ad http://bugs.python.org/issue13023 opened by denilsonsa #13024: cgitb uses stdout encoding http://bugs.python.org/issue13024 opened by haypo #13025: mimetypes should read the rule file using UTF-8, not the local http://bugs.python.org/issue13025 opened by haypo #13026: Dis module - documentation of MAKE_FUNCTION http://bugs.python.org/issue13026 opened by arno #13027: python 2.6.6 interpreter core dumps on modules command from he http://bugs.python.org/issue13027 opened by Balachandran.Sivakumar #13028: python wastes linux users time by checking for dylib on each d http://bugs.python.org/issue13028 opened by fzvqedi #13029: test_strptime fails on Windows 7 french http://bugs.python.org/issue13029 opened by haypo #13030: Be more generic when identifying the Windows main dir in insta http://bugs.python.org/issue13030 opened by sandro.tosi #13031: [PATCH] small speed-up for tarfile.py when unzipping tarballs http://bugs.python.org/issue13031 opened by jpeel #13032: h2py.py can fail with UnicodeDecodeError http://bugs.python.org/issue13032 opened by Arfrever #13033: recursive chown for shutils http://bugs.python.org/issue13033 opened by Low.Kian.Seong #13034: Python does not read Alternative Subject Names from SSL certif http://bugs.python.org/issue13034 opened by atrasatti #13035: "maintainer" value clear the "author" value when registering http://bugs.python.org/issue13035 opened by jab Most recent 15 issues with no replies (15) ========================================== #13035: "maintainer" value clear the "author" value when registering http://bugs.python.org/issue13035 #13034: Python does not read Alternative Subject Names from SSL certif http://bugs.python.org/issue13034 #13032: h2py.py can fail with UnicodeDecodeError http://bugs.python.org/issue13032 #13030: Be more generic when identifying the Windows main dir in insta http://bugs.python.org/issue13030 #13025: mimetypes should read the rule file using UTF-8, not the local http://bugs.python.org/issue13025 #13024: cgitb uses stdout encoding http://bugs.python.org/issue13024 #13023: argparse should allow displaying argument default values in ad http://bugs.python.org/issue13023 #13019: bytearrayobject.c: refleak http://bugs.python.org/issue13019 #13018: dictobject.c: refleak http://bugs.python.org/issue13018 #13017: pyexpat.c: refleak http://bugs.python.org/issue13017 #13016: selectmodule.c: refleak http://bugs.python.org/issue13016 #13015: _collectionsmodule.c: refleak http://bugs.python.org/issue13015 #13011: Frozen programs require the original build directory in order http://bugs.python.org/issue13011 #12984: XML NamedNodeMap ( attribName in NamedNodeMap fails ) http://bugs.python.org/issue12984 #12983: byte string literals with invalid hex escape codes raise Value http://bugs.python.org/issue12983 Most recent 15 issues waiting for review (15) ============================================= #13032: h2py.py can fail with UnicodeDecodeError http://bugs.python.org/issue13032 #13031: [PATCH] small speed-up for tarfile.py when unzipping tarballs http://bugs.python.org/issue13031 #13025: mimetypes should read the rule file using UTF-8, not the local http://bugs.python.org/issue13025 #13024: cgitb uses stdout encoding http://bugs.python.org/issue13024 #13018: dictobject.c: refleak http://bugs.python.org/issue13018 #13017: pyexpat.c: refleak http://bugs.python.org/issue13017 #13016: selectmodule.c: refleak http://bugs.python.org/issue13016 #13015: _collectionsmodule.c: refleak http://bugs.python.org/issue13015 #13012: Allow keyword argument in str.splitlines() http://bugs.python.org/issue13012 #13001: test_socket.testRecvmsgTrunc failure on FreeBSD 7.2 buildbot http://bugs.python.org/issue13001 #12991: Python 64-bit build on HP Itanium - Executable built successfu http://bugs.python.org/issue12991 #12989: Consistently handle path separator in Py_GetPath on Windows http://bugs.python.org/issue12989 #12986: Using getrandbits() in uuid.uuid4() is faster and more readabl http://bugs.python.org/issue12986 #12985: Check signed arithmetic overflow in ./configure http://bugs.python.org/issue12985 #12981: rewrite multiprocessing (senfd|recvfd) in Python http://bugs.python.org/issue12981 Top 10 most discussed issues (10) ================================= #12981: rewrite multiprocessing (senfd|recvfd) in Python http://bugs.python.org/issue12981 11 msgs #12943: tokenize: add python -m tokenize support back http://bugs.python.org/issue12943 8 msgs #12991: Python 64-bit build on HP Itanium - Executable built successfu http://bugs.python.org/issue12991 8 msgs #12729: Python lib re cannot handle Unicode properly due to narrow/wid http://bugs.python.org/issue12729 7 msgs #12955: urllib.request example should use "with ... as:" http://bugs.python.org/issue12955 7 msgs #13000: unhandled exception at install http://bugs.python.org/issue13000 7 msgs #13012: Allow keyword argument in str.splitlines() http://bugs.python.org/issue13012 6 msgs #12998: Memory leak with CTypes Structure http://bugs.python.org/issue12998 5 msgs #13026: Dis module - documentation of MAKE_FUNCTION http://bugs.python.org/issue13026 5 msgs #11816: Refactor the dis module to provide better building blocks for http://bugs.python.org/issue11816 4 msgs Issues closed (31) ================== #11037: State of PEP 382 or How does distutils2 handle namespaces? http://bugs.python.org/issue11037 closed by eric.araujo #11701: email.parser.BytesParser().parse() closes file argument http://bugs.python.org/issue11701 closed by sdaoden #11913: sdist refuses README.rst http://bugs.python.org/issue11913 closed by eric.araujo #11935: MMDF/MBOX mailbox need utime http://bugs.python.org/issue11935 closed by sdaoden #12145: distutils2 should support README.rst http://bugs.python.org/issue12145 closed by eric.araujo #12395: packaging remove fails under Windows http://bugs.python.org/issue12395 closed by eric.araujo #12678: test_packaging and test_distutils failures under Windows http://bugs.python.org/issue12678 closed by eric.araujo #12785: list_distinfo_file is wrong http://bugs.python.org/issue12785 closed by eric.araujo #12931: xmlrpclib confuses unicode and string http://bugs.python.org/issue12931 closed by haypo #12936: armv5tejl segfaults: sched_setaffinity() vs. pthread_setaffini http://bugs.python.org/issue12936 closed by skrah #12938: html.escape docstring does not mention single quotes (') http://bugs.python.org/issue12938 closed by orsenthil #12958: test_socket failures on Mac OS X http://bugs.python.org/issue12958 closed by python-dev #12960: threading.Condition is not a class http://bugs.python.org/issue12960 closed by haypo #12961: itertools: unlabelled balls in boxes http://bugs.python.org/issue12961 closed by rhettinger #12972: Color prompt + readline http://bugs.python.org/issue12972 closed by terry.reedy #12976: add support for MirBSD platform http://bugs.python.org/issue12976 closed by loewis #12977: socket.socket.setblocking does not raise exception if no data http://bugs.python.org/issue12977 closed by georg.brandl #12994: cx_Oracle failed to load in newly build python 2.7.1 http://bugs.python.org/issue12994 closed by terry.reedy #12995: Different behaviours with between v3.1.2 and v3.2. http://bugs.python.org/issue12995 closed by benjamin.peterson #12996: multiprocessing.Connection endianness issue http://bugs.python.org/issue12996 closed by neologix #13002: peephole.c: unused parameter http://bugs.python.org/issue13002 closed by skrah #13003: Bug in equivalent code for itertools.izip_longest http://bugs.python.org/issue13003 closed by georg.brandl #13005: operator module docs include repeat http://bugs.python.org/issue13005 closed by python-dev #13006: bug in core python variable binding http://bugs.python.org/issue13006 closed by amaury.forgeotdarc #13007: gdbm 1.9 has new magic that whichdb does not recognize http://bugs.python.org/issue13007 closed by python-dev #13010: devguide doc: ./python.exe on OS X http://bugs.python.org/issue13010 closed by ezio.melotti #13021: Resource is not released before returning from the function http://bugs.python.org/issue13021 closed by barry #13022: _multiprocessing.recvfd() doesn't check that file descriptor w http://bugs.python.org/issue13022 closed by python-dev #13036: time format in logging is wrong http://bugs.python.org/issue13036 closed by vinay.sajip #793069: Add --remove-source option http://bugs.python.org/issue793069 closed by eric.araujo #1172711: long long support for array module http://bugs.python.org/issue1172711 closed by meador.inge From merwok at netwok.org Fri Sep 23 19:11:39 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Fri, 23 Sep 2011 19:11:39 +0200 Subject: [Python-Dev] [Python-checkins] cpython (3.2): Issue #7732: Don't open a directory as a file anymore while importing a In-Reply-To: References: Message-ID: <4E7CBDCB.9000506@netwok.org> Hi Victor, > diff --git a/Misc/NEWS b/Misc/NEWS > --- a/Misc/NEWS > +++ b/Misc/NEWS > @@ -10,6 +10,10 @@ > Core and Builtins > ----------------- > > +- Issue #7732: Don't open a directory as a file anymore while importing a > + module. Ignore the direcotry if its name matchs the module name (e.g. Typo: direcotry From ezio.melotti at gmail.com Fri Sep 23 19:22:55 2011 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Fri, 23 Sep 2011 20:22:55 +0300 Subject: [Python-Dev] [Python-checkins] cpython (3.2): Issue #7732: Don't open a directory as a file anymore while importing a In-Reply-To: <4E7CBDCB.9000506@netwok.org> References: <4E7CBDCB.9000506@netwok.org> Message-ID: <4E7CC06F.9090301@gmail.com> On 23/09/2011 20.11, ?ric Araujo wrote: > Hi Victor, > >> diff --git a/Misc/NEWS b/Misc/NEWS >> --- a/Misc/NEWS >> +++ b/Misc/NEWS >> @@ -10,6 +10,10 @@ >> Core and Builtins >> ----------------- >> >> +- Issue #7732: Don't open a directory as a file anymore while importing a >> + module. Ignore the direcotry if its name matchs the module name (e.g. > Typo: direcotry Typo: matchs From ethan at stoneleaf.us Fri Sep 23 20:04:50 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 23 Sep 2011 11:04:50 -0700 Subject: [Python-Dev] range objects in 3.x Message-ID: <4E7CCA42.2060100@stoneleaf.us> A question came up on StackOverflow about range objects and floating point numbers. I thought about writing an frange that did for floats what range does for ints, so started examining the range class. I noticed it has __le__, __lt__, __eq__, __ne__, __ge__, and __gt__ methods. Some experiments show that xrange in 2.x does indeed implement those operations, but in 3.x range does not (TypeError: unorderable types: range() > range()). Was this intentional, or should I file a bug report? (I was unable to find anything in the What's New documents; also, I did not test in 3.0, just in 2.7, 3.1, 3.2.) ~Ethan~ From benjamin at python.org Fri Sep 23 20:14:36 2011 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 23 Sep 2011 14:14:36 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E7CCA42.2060100@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> Message-ID: 2011/9/23 Ethan Furman : > A question came up on StackOverflow about range objects and floating point > numbers. ?I thought about writing an frange that did for floats what range > does for ints, so started examining the range class. ?I noticed it has > __le__, __lt__, __eq__, __ne__, __ge__, and __gt__ methods. ?Some > experiments show that xrange in 2.x does indeed implement those operations, > but in 3.x range does not (TypeError: unorderable types: range() > range()). > > Was this intentional, or should I file a bug report? ?(I was unable to find > anything in the What's New documents; also, I did not test in 3.0, just in > 2.7, 3.1, 3.2.) That's simply a consequence of everything having comparisons defined in 2.x. The comparison is essentially meaningless. -- Regards, Benjamin From guido at python.org Fri Sep 23 20:23:07 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 23 Sep 2011 11:23:07 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> Message-ID: Also, Ethan, I hope you're familiar with the reason why there is no range() support for floats currently? (Briefly, things like range(0.0, 0.8, step=0.1) could include or exclude the end point depending on rounding, which makes for troublesome semantics.) On Fri, Sep 23, 2011 at 11:14 AM, Benjamin Peterson wrote: > 2011/9/23 Ethan Furman : >> A question came up on StackOverflow about range objects and floating point >> numbers. ?I thought about writing an frange that did for floats what range >> does for ints, so started examining the range class. ?I noticed it has >> __le__, __lt__, __eq__, __ne__, __ge__, and __gt__ methods. ?Some >> experiments show that xrange in 2.x does indeed implement those operations, >> but in 3.x range does not (TypeError: unorderable types: range() > range()). >> >> Was this intentional, or should I file a bug report? ?(I was unable to find >> anything in the What's New documents; also, I did not test in 3.0, just in >> 2.7, 3.1, 3.2.) > > That's simply a consequence of everything having comparisons defined > in 2.x. The comparison is essentially meaningless. > > > -- > Regards, > Benjamin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Fri Sep 23 20:25:35 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 23 Sep 2011 11:25:35 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> Message-ID: <4E7CCF1F.1070906@stoneleaf.us> Guido van Rossum wrote: > Also, Ethan, I hope you're familiar with the reason why there is no > range() support for floats currently? (Briefly, things like range(0.0, > 0.8, step=0.1) could include or exclude the end point depending on > rounding, which makes for troublesome semantics.) Good point, thanks for the reminder. ~Ethan~ From ethan at stoneleaf.us Fri Sep 23 22:23:26 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 23 Sep 2011 13:23:26 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7CD070.9020507@stoneleaf.us> Message-ID: <4E7CEABE.8090001@stoneleaf.us> Benjamin Peterson wrote: > 2011/9/23 Ethan Furman : >> >> Follow-up question: since the original range returned lists, and comparisons >> do make sense for lists, should the new range also implement them? > > What would be the use-case? The only reason I'm aware of at the moment is to prevent loss of functionality from 2.x range to 3.x range. I'm -0 with a decision to not have range be orderable; but I understand there are bigger fish to fry. :) My original concern was that the comparison methods were there at all, but looking around I see object has them, so it makes sense to me now. I had thought I would have to implement them if I went ahead with an frange (for floats). >> I note >> that it does implement __contains__, __getitem__, count, and index in the >> same way that list does. > > That's because it implements the Sequence ABC. So the question becomes, Why does it implement the Sequence ABC? Because the original range returned a list and those operations made sense? ~Ethan~ From catch-all at masklinn.net Fri Sep 23 22:04:11 2011 From: catch-all at masklinn.net (Xavier Morel) Date: Fri, 23 Sep 2011 22:04:11 +0200 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> Message-ID: <72911D45-C7BB-4EA2-88FF-2E7C6476A114@masklinn.net> On 2011-09-23, at 20:23 , Guido van Rossum wrote: > Also, Ethan, I hope you're familiar with the reason why there is no > range() support for floats currently? (Briefly, things like range(0.0, > 0.8, step=0.1) could include or exclude the end point depending on > rounding, which makes for troublesome semantics.) On the other hand, there could be a range for Decimal could there not? From benjamin at python.org Fri Sep 23 22:26:47 2011 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 23 Sep 2011 16:26:47 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E7CEABE.8090001@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E7CD070.9020507@stoneleaf.us> <4E7CEABE.8090001@stoneleaf.us> Message-ID: 2011/9/23 Ethan Furman : > Benjamin Peterson wrote: >> >> 2011/9/23 Ethan Furman : > >>> >>> >>> Follow-up question: since the original range returned lists, and >>> comparisons >>> do make sense for lists, should the new range also implement them? >> >> What would be the use-case? > > The only reason I'm aware of at the moment is to prevent loss of > functionality from 2.x range to 3.x range. range comparisons in 2.x have no functionality. > >>> I note >>> that it does implement __contains__, __getitem__, count, and index in the >>> same way that list does. >> >> That's because it implements the Sequence ABC. > > So the question becomes, Why does it implement the Sequence ABC? Because the > original range returned a list and those operations made sense? I'm not sure what the history is. -- Regards, Benjamin From ethan at stoneleaf.us Fri Sep 23 22:38:08 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 23 Sep 2011 13:38:08 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7CD070.9020507@stoneleaf.us> <4E7CEABE.8090001@stoneleaf.us> Message-ID: <4E7CEE30.7090406@stoneleaf.us> Benjamin Peterson wrote: > 2011/9/23 Ethan Furman : >> Benjamin Peterson wrote: >>> 2011/9/23 Ethan Furman : >>>> >>>> Follow-up question: since the original range returned lists, and >>>> comparisons >>>> do make sense for lists, should the new range also implement them? >>> What would be the use-case? >> The only reason I'm aware of at the moment is to prevent loss of >> functionality from 2.x range to 3.x range. > > range comparisons in 2.x have no functionality. Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. --> r1 = range(10) --> r2 = range(0, 20, 2) --> r3 = range(10) --> r1 == r3 True --> r1 < r2 True --> r3 > r2 False Yes, I realize this is because range returned a list in 2.x. However, aren't __contains__, __getitem__, count, and index implemented in 3.x range because 2.x range returned lists? ~Ethan~ From guido at python.org Fri Sep 23 23:04:16 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 23 Sep 2011 14:04:16 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E7CEABE.8090001@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E7CD070.9020507@stoneleaf.us> <4E7CEABE.8090001@stoneleaf.us> Message-ID: On Fri, Sep 23, 2011 at 1:23 PM, Ethan Furman wrote: > The only reason I'm aware of at the moment is to prevent loss of > functionality from 2.x range to 3.x range. > > I'm -0 with a decision to not have range be orderable; but I understand > there are bigger fish to fry. ?:) I don't believe there's a valid use case for ordering ranges. As for backwards compatibility, apparently nobody cares or we would've heard about it. > My original concern was that the comparison methods were there at all, but > looking around I see object has them, so it makes sense to me now. I had > thought I would have to implement them if I went ahead with an frange (for > floats). [...]> So the question becomes, Why does it implement the Sequence ABC? Because the > original range returned a list and those operations made sense? Because all operations on Sequence make sense: you can iterate over a range, it has a definite number of items, and so on; all other sequence operations can be derived from that easily (and in fact they almost all be done in O(1) time). -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Fri Sep 23 23:06:24 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 23 Sep 2011 23:06:24 +0200 Subject: [Python-Dev] range objects in 3.x References: <4E7CCA42.2060100@stoneleaf.us> <4E7CD070.9020507@stoneleaf.us> <4E7CEABE.8090001@stoneleaf.us> Message-ID: <20110923230624.7a5f0c03@msiwind> Le Fri, 23 Sep 2011 13:23:26 -0700, Ethan Furman a ?crit : > > So the question becomes, Why does it implement the Sequence ABC? Because these operations are trivial to implement and it would be suboptimal to have to instantiate the full list to run them? From martin at v.loewis.de Sat Sep 24 00:01:07 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 24 Sep 2011 00:01:07 +0200 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E7CEE30.7090406@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E7CD070.9020507@stoneleaf.us> <4E7CEABE.8090001@stoneleaf.us> <4E7CEE30.7090406@stoneleaf.us> Message-ID: <4E7D01A3.3010704@v.loewis.de> > Yes, I realize this is because range returned a list in 2.x. However, > aren't __contains__, __getitem__, count, and index implemented in 3.x > range because 2.x range returned lists? No, they are implemented because they are meaningful, and with an obvious meaning. "Is 30 in the range from 10 to 40?" is something that everybody will answer the same way. "What is the fifth element of the range from 10 to 40?" may not have such a universal meaning, but people familiar with the mathematical concept of an interval can readily guess the answer (except that they may wonder whether to start counting at 0 or 1). "Is the range from 5 to 100 larger than the range from 10 to 100?" is something that most people would answer as "yes" (I believe), yet py> range(5,100) > range(10,100) False Regards, Martin From ethan at stoneleaf.us Sat Sep 24 00:24:24 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 23 Sep 2011 15:24:24 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E7D01A3.3010704@v.loewis.de> References: <4E7CCA42.2060100@stoneleaf.us> <4E7CD070.9020507@stoneleaf.us> <4E7CEABE.8090001@stoneleaf.us> <4E7CEE30.7090406@stoneleaf.us> <4E7D01A3.3010704@v.loewis.de> Message-ID: <4E7D0718.1000605@stoneleaf.us> Martin v. L?wis wrote: >> Yes, I realize this is because range returned a list in 2.x. However, >> aren't __contains__, __getitem__, count, and index implemented in 3.x >> range because 2.x range returned lists? > > No, they are implemented because they are meaningful, and with an > obvious meaning. "Is 30 in the range from 10 to 40?" is something > that everybody will answer the same way. "What is the fifth element > of the range from 10 to 40?" may not have such a universal meaning, > but people familiar with the mathematical concept of an interval > can readily guess the answer (except that they may wonder whether > to start counting at 0 or 1). > > "Is the range from 5 to 100 larger than the range from 10 to 100?" > is something that most people would answer as "yes" (I believe), > yet > > py> range(5,100) > range(10,100) > False Thanks, Martin! I can see where there could be many interpretations about the meaning of less-than and greater-than with regards to range. ~Ethan~ From greg.ewing at canterbury.ac.nz Sat Sep 24 01:25:11 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 24 Sep 2011 11:25:11 +1200 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E7CEABE.8090001@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E7CD070.9020507@stoneleaf.us> <4E7CEABE.8090001@stoneleaf.us> Message-ID: <4E7D1557.5040107@canterbury.ac.nz> Ethan Furman wrote: > The only reason I'm aware of at the moment is to prevent loss of > functionality from 2.x range to 3.x range. Since 2.x range(...) is equivalent to 3.x list(range(...)), I don't see any loss of functionality there. Comparing range objects directly in 3.x is like comparing xrange objects in 2.x, and there the comparison was arbitrary -- it did *not* compare them like their corresponding lists: Python 2.7 (r27:82500, Oct 15 2010, 21:14:33) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> a = xrange(5) >>> b = xrange(5) >>> a > b True -- Greg From techtonik at gmail.com Sat Sep 24 01:25:52 2011 From: techtonik at gmail.com (anatoly techtonik) Date: Sat, 24 Sep 2011 02:25:52 +0300 Subject: [Python-Dev] Inconsistent script/console behaviour Message-ID: Currently if you work in console and define a function and then immediately call it - it will fail with SyntaxError. For example, copy paste this completely valid Python script into console: def some(): print "XXX" some() There is an issue for that that was just closed by Eric. However, I'd like to know if there are people here that agree that if you paste a valid Python script into console - it should work without changes. -- anatoly t. From guido at python.org Sat Sep 24 01:32:30 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 23 Sep 2011 16:32:30 -0700 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik wrote: > Currently if you work in console and define a function and then > immediately call it - it will fail with SyntaxError. > For example, copy paste this completely valid Python script into console: > > def some(): > ?print "XXX" > some() > > There is an issue for that that was just closed by Eric. However, I'd > like to know if there are people here that agree that if you paste a > valid Python script into console - it should work without changes. You can't fix this without completely changing the way the interactive console treats blank lines. None that it's not just that a blank line is required after a function definition -- you also *can't* have a blank line *inside* a function definition. The interactive console is optimized for people entering code by typing, not by copying and pasting large gobs of text. If you think you can have it both, show us the code. -- --Guido van Rossum (python.org/~guido) From ubershmekel at gmail.com Sat Sep 24 01:34:42 2011 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Fri, 23 Sep 2011 19:34:42 -0400 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: I agree that it should and it doesn't. I also recall that not having empty lines between function/class definitions can cause indentation errors when pasting to the console on my windows machine. --Yuval On Sep 23, 2011 7:26 PM, "anatoly techtonik" wrote: > Currently if you work in console and define a function and then > immediately call it - it will fail with SyntaxError. > For example, copy paste this completely valid Python script into console: > > def some(): > print "XXX" > some() > > There is an issue for that that was just closed by Eric. However, I'd > like to know if there are people here that agree that if you paste a > valid Python script into console - it should work without changes. > -- > anatoly t. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ubershmekel%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Sep 24 01:49:53 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 23 Sep 2011 19:49:53 -0400 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: On 9/23/2011 7:25 PM, anatoly techtonik wrote: > Currently if you work in console and define a function and then > immediately call it - it will fail with SyntaxError. > For example, copy paste this completely valid Python script into console: > > def some(): > print "XXX" > some() > > There is an issue for that that was just closed by Eric. However, I'd > like to know if there are people here that agree that if you paste a > valid Python script into console - it should work without changes. For this kind of multi-line, multi-statemenmt pasting, open an IDLE edit window for tem.py (my name) or such, paste, run with F5. I have found that this works for me than direct pasting. A interactive lisp interpreter can detect end-of-statement without a blank line by matching a closing paren to the open paren that starts every expression. -- Terry Jan Reedy From brian.curtin at gmail.com Sat Sep 24 02:03:29 2011 From: brian.curtin at gmail.com (Brian Curtin) Date: Fri, 23 Sep 2011 19:03:29 -0500 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: On Fri, Sep 23, 2011 at 18:49, Terry Reedy wrote: > A interactive lisp interpreter can detect end-of-statement without a blank > line by matching a closing paren to the open paren that starts every > expression. Braces-loving programmers around the world are feverishly writing a PEP as we speak. From steve at pearwood.info Sat Sep 24 03:36:07 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 24 Sep 2011 11:36:07 +1000 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E7CCA42.2060100@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> Message-ID: <4E7D3407.5000207@pearwood.info> Ethan Furman wrote: > A question came up on StackOverflow about range objects and floating > point numbers. I thought about writing an frange that did for floats > what range does for ints, For what it's worth, here's mine: http://code.activestate.com/recipes/577068-floating-point-range/ -- Steven From guido at python.org Sat Sep 24 03:49:10 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 23 Sep 2011 18:49:10 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E7D3407.5000207@pearwood.info> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> Message-ID: On Fri, Sep 23, 2011 at 6:36 PM, Steven D'Aprano wrote: > Ethan Furman wrote: >> >> A question came up on StackOverflow about range objects and floating point >> numbers. ?I thought about writing an frange that did for floats what range >> does for ints, > > > For what it's worth, here's mine: > > http://code.activestate.com/recipes/577068-floating-point-range/ I notice that your examples carefully skirt around the rounding issues. Check out frange(0.0, 2.1, 0.7). -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Sat Sep 24 04:13:12 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 24 Sep 2011 12:13:12 +1000 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> Message-ID: <4E7D3CB8.5050904@pearwood.info> Guido van Rossum wrote: > On Fri, Sep 23, 2011 at 6:36 PM, Steven D'Aprano wrote: >> Ethan Furman wrote: >>> A question came up on StackOverflow about range objects and floating point >>> numbers. I thought about writing an frange that did for floats what range >>> does for ints, >> >> For what it's worth, here's mine: >> >> http://code.activestate.com/recipes/577068-floating-point-range/ > > I notice that your examples carefully skirt around the rounding issues. I also carefully *didn't* claim that it made rounding issues disappear completely. I'll add a note clarifying that rounding still occurs and as a consequence results can be unexpected. Thanks for taking the time to comment. -- Steven From guido at python.org Sat Sep 24 04:40:43 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 23 Sep 2011 19:40:43 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E7D3CB8.5050904@pearwood.info> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> Message-ID: On Fri, Sep 23, 2011 at 7:13 PM, Steven D'Aprano wrote: >>> http://code.activestate.com/recipes/577068-floating-point-range/ >> >> I notice that your examples carefully skirt around the rounding issues. > > I also carefully *didn't* claim that it made rounding issues disappear > completely. I'll add a note clarifying that rounding still occurs and as a > consequence results can be unexpected. I believe this API is fundamentally wrong for float ranges, even if it's great for int ranges, and I will fight against adding it to the stdlib in that form. Maybe we can come up with a better API, and e.g. specify begin and end points and the number of subdivisions? E.g. frange(0.0, 2.1, 3) would generate [0.0, 0.7, 1.4]. Or maybe it would even be better to use inclusive end points? OTOH if you consider extending the API to complex numbers, it might be better to specify an initial value, a step, and a count. So frange(0.0, 0.7, 3) to generate [0.0, 0.7, 1.4]. Probably it shouldn't be called frange then. -- --Guido van Rossum (python.org/~guido) From g.brandl at gmx.net Sat Sep 24 08:55:19 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 24 Sep 2011 08:55:19 +0200 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> Message-ID: Am 24.09.2011 04:40, schrieb Guido van Rossum: > On Fri, Sep 23, 2011 at 7:13 PM, Steven D'Aprano wrote: >>>> http://code.activestate.com/recipes/577068-floating-point-range/ >>> >>> I notice that your examples carefully skirt around the rounding issues. >> >> I also carefully *didn't* claim that it made rounding issues disappear >> completely. I'll add a note clarifying that rounding still occurs and as a >> consequence results can be unexpected. > > I believe this API is fundamentally wrong for float ranges, even if > it's great for int ranges, and I will fight against adding it to the > stdlib in that form. > > Maybe we can come up with a better API, and e.g. specify begin and end > points and the number of subdivisions? E.g. frange(0.0, 2.1, 3) would > generate [0.0, 0.7, 1.4]. This is what numpy calls linspace: http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html numpy also has an "arange" that works with floats, but: """When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases.""" Georg From g.brandl at gmx.net Sat Sep 24 10:27:32 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 24 Sep 2011 10:27:32 +0200 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: Am 24.09.2011 01:32, schrieb Guido van Rossum: > On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik wrote: >> Currently if you work in console and define a function and then >> immediately call it - it will fail with SyntaxError. >> For example, copy paste this completely valid Python script into console: >> >> def some(): >> print "XXX" >> some() >> >> There is an issue for that that was just closed by Eric. However, I'd >> like to know if there are people here that agree that if you paste a >> valid Python script into console - it should work without changes. > > You can't fix this without completely changing the way the interactive > console treats blank lines. None that it's not just that a blank line > is required after a function definition -- you also *can't* have a > blank line *inside* a function definition. While the former could be changed (I think), the latter certainly cannot. So it's probably not worth changing established behavior. Georg From ubershmekel at gmail.com Sat Sep 24 11:53:27 2011 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sat, 24 Sep 2011 05:53:27 -0400 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: Could you elaborate on what would be wrong if function definitions ended only after an explicitly less indented line? The only problem that comes to mind is global scope "if" statements that wouldn't execute when expected (we actually might need to terminate them with a dedented "pass"). On Sep 24, 2011 4:26 AM, "Georg Brandl" wrote: > Am 24.09.2011 01:32, schrieb Guido van Rossum: >> On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik wrote: >>> Currently if you work in console and define a function and then >>> immediately call it - it will fail with SyntaxError. >>> For example, copy paste this completely valid Python script into console: >>> >>> def some(): >>> print "XXX" >>> some() >>> >>> There is an issue for that that was just closed by Eric. However, I'd >>> like to know if there are people here that agree that if you paste a >>> valid Python script into console - it should work without changes. >> >> You can't fix this without completely changing the way the interactive >> console treats blank lines. None that it's not just that a blank line >> is required after a function definition -- you also *can't* have a >> blank line *inside* a function definition. > > While the former could be changed (I think), the latter certainly cannot. > So it's probably not worth changing established behavior. > > Georg > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ubershmekel%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Sat Sep 24 12:05:21 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 24 Sep 2011 12:05:21 +0200 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: You're right that in principle for function definitions there is no ambiguity. But you also presented the downfall of that proposal: all multi-clause statements will still need an explicit way of termination, and of course the "pass" would be exceedingly ugly, not to mention much more confusing than the current way. Georg Am 24.09.2011 11:53, schrieb Yuval Greenfield: > Could you elaborate on what would be wrong if function definitions ended only > after an explicitly less indented line? The only problem that comes to mind is > global scope "if" statements that wouldn't execute when expected (we actually > might need to terminate them with a dedented "pass"). > > On Sep 24, 2011 4:26 AM, "Georg Brandl" > wrote: >> Am 24.09.2011 01:32, schrieb Guido van Rossum: >>> On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik > wrote: >>>> Currently if you work in console and define a function and then >>>> immediately call it - it will fail with SyntaxError. >>>> For example, copy paste this completely valid Python script into console: >>>> >>>> def some(): >>>> print "XXX" >>>> some() >>>> >>>> There is an issue for that that was just closed by Eric. However, I'd >>>> like to know if there are people here that agree that if you paste a >>>> valid Python script into console - it should work without changes. >>> >>> You can't fix this without completely changing the way the interactive >>> console treats blank lines. None that it's not just that a blank line >>> is required after a function definition -- you also *can't* have a >>> blank line *inside* a function definition. >> >> While the former could be changed (I think), the latter certainly cannot. >> So it's probably not worth changing established behavior. From guido at python.org Sat Sep 24 16:59:28 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 24 Sep 2011 07:59:28 -0700 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: I see a lot of flawed "proposals". This is clearly a python-ideas discussion. (Anatoly, take note -- please post your new gripe there.) In the mean time, there's a reasonable work-around if you have to copy/paste a large block of formatted code: >>> exec(''' . . . . . . ''') >>> The only thing that you can't put in there is a triple-quoted string using single quotes. -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Sep 24 17:13:11 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 24 Sep 2011 08:13:11 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> Message-ID: On Fri, Sep 23, 2011 at 11:55 PM, Georg Brandl wrote: > Am 24.09.2011 04:40, schrieb Guido van Rossum: >> On Fri, Sep 23, 2011 at 7:13 PM, Steven D'Aprano wrote: >>>>> http://code.activestate.com/recipes/577068-floating-point-range/ >>>> >>>> I notice that your examples carefully skirt around the rounding issues. >>> >>> I also carefully *didn't* claim that it made rounding issues disappear >>> completely. I'll add a note clarifying that rounding still occurs and as a >>> consequence results can be unexpected. >> >> I believe this API is fundamentally wrong for float ranges, even if >> it's great for int ranges, and I will fight against adding it to the >> stdlib in that form. >> >> Maybe we can come up with a better API, and e.g. specify begin and end >> points and the number of subdivisions? E.g. frange(0.0, 2.1, 3) would >> generate [0.0, 0.7, 1.4]. > > This is what numpy calls linspace: > http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html > > numpy also has an "arange" that works with floats, but: > """When using a non-integer step, such as 0.1, the results will often not be > consistent. It is better to use linspace for these cases.""" Aha, I like linspace(). I started a G+ thread (https://plus.google.com/u/0/115212051037621986145/posts/ZnrWDiHHiaW) but it mostly served to demonstrate that few people understand floating point, and that those that do don't understand how hard it is for the others. Jeffrey Yaskin's analysis (starting with "To anyone who thinks they can recover inside frange():") is the best of the bunch. But I still believe that it's best *not* to have frange(), and to warn about the flaws in the existing implementations floating around (like Steven's), referring them to linspace() instead. It looks easy enough to implement a basic linspace() that doesn't have the problems of frange(), and having a recipe handy (for those who don't want or need NumPy) would be a great start. I expect that to implement a version worthy of the stdlib math module, i.e. that computes values that are correct within 0.5ULP under all circumstances (e.g. lots of steps, or an end point close to the end of the floating point range) we'd need a numerical wizard like Mark Dickinson or Tim Peters (retired). Or maybe we could just borrow numpy's code. -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Sat Sep 24 23:12:30 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 24 Sep 2011 17:12:30 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> Message-ID: On 9/23/2011 10:40 PM, Guido van Rossum wrote: > On Fri, Sep 23, 2011 at 7:13 PM, Steven D'Aprano wrote: >> I also carefully *didn't* claim that it made rounding issues disappear >> completely. I'll add a note clarifying that rounding still occurs and as a >> consequence results can be unexpected. To avoid inclusion/exclusion errors, you should be testing values against a stop value that is (except for rounding errors) half a step above the last value you want to yield. In other words, subtract or add step/2.0 to the stop value according to whether or not you want it excluded or included. > I believe this API is fundamentally wrong for float ranges, even if > it's great for int ranges, and I will fight against adding it to the > stdlib in that form. I completely agree. For range(n), n is both the stop value and number of ints generated. It is otherwise stop-start, which is to say, stop = start + n, which is why there is no need for an n-based api (all this is by design). > Maybe we can come up with a better API, and e.g. specify begin and end > points and the number of subdivisions? E.g. frange(0.0, 2.1, 3) would > generate [0.0, 0.7, 1.4]. Or maybe it would even be better to use > inclusive end points? OTOH if you consider extending the API to > complex numbers, it might be better to specify an initial value, a > step, and a count. So frange(0.0, 0.7, 3) to generate [0.0, 0.7, 1.4]. > Probably it shouldn't be called frange then. In float use cases I can think of, one wants either both or neither end point. If neither, one probably wants points at .5*step, 1.5*step, etc., where step calculated as (right-left)/n. -- Terry Jan Reedy From steve at pearwood.info Sun Sep 25 07:21:06 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 25 Sep 2011 15:21:06 +1000 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> Message-ID: <4E7EBA42.4060707@pearwood.info> Guido van Rossum wrote: > On Fri, Sep 23, 2011 at 7:13 PM, Steven D'Aprano wrote: >>>> http://code.activestate.com/recipes/577068-floating-point-range/ >>> I notice that your examples carefully skirt around the rounding issues. >> I also carefully *didn't* claim that it made rounding issues disappear >> completely. I'll add a note clarifying that rounding still occurs and as a >> consequence results can be unexpected. > > I believe this API is fundamentally wrong for float ranges, even if > it's great for int ranges, and I will fight against adding it to the > stdlib in that form. I wasn't proposing it to be in the standard lib, it was just an idle comment triggered by the OP's question. But I'm gratified it has started an interesting discussion. Whether the most float-friendly or not, the start/stop/step API is the most obvious and user-friendly for at least one use-case: graphing of functions. It is natural to say something like "draw a graph starting at 0, sampling every 0.1 unit, and stop when you get past 3". My HP-48 graphing calculator does exactly that: you must specify the start and stop coordinates, and an optional step size. By default, the step size is calculated for you assuming you want one point plotted per pixel. Given that the calculator display is both low-resolution and fixed size, that makes sense as the default, but you can set the step size manually if desired. start/stop/step is also familiar for users of Excel and other spreadsheets' Fill>Series command. Numeric integration is an interesting case, because generally you want multiple iterations, interpolating between the points previously seen until you reach some desired level of accuracy. E.g.: #1: 0.0, 0.5, 1.0 #2: 0.25, 0.75 #3: 0.125, 0.375, 0.625, 0.875 For integration, I would probably want both APIs. > Maybe we can come up with a better API, and e.g. specify begin and end > points and the number of subdivisions? Thanks to Mark Dickinson for suggesting using Fraction, I have this: http://code.activestate.com/recipes/577878-generate-equally-spaced-floats/ -- Steven From guido at python.org Sun Sep 25 16:38:55 2011 From: guido at python.org (Guido van Rossum) Date: Sun, 25 Sep 2011 07:38:55 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E7EBA42.4060707@pearwood.info> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E7EBA42.4060707@pearwood.info> Message-ID: On Sat, Sep 24, 2011 at 10:21 PM, Steven D'Aprano wrote: > Guido van Rossum wrote: >> I believe this API is fundamentally wrong for float ranges, even if >> it's great for int ranges, and I will fight against adding it to the >> stdlib in that form. > > I wasn't proposing it to be in the standard lib, it was just an idle comment > triggered by the OP's question. But I'm gratified it has started an > interesting discussion. > > Whether the most float-friendly or not, the start/stop/step API is the most > obvious and user-friendly for at least one use-case: graphing of functions. It *appears* that. But the flaws make for hard-to-debug edge cases (when an extra point unexpectedly appears). I've debugged a few bits of charting code, and there are enough other causes for confusing output that we don't need this problem. > It is natural to say something like "draw a graph starting at 0, sampling > every 0.1 unit, and stop when you get past 3". My HP-48 graphing calculator > does exactly that: you must specify the start and stop coordinates, and an > optional step size. By default, the step size is calculated for you assuming > you want one point plotted per pixel. Given that the calculator display is > both low-resolution and fixed size, that makes sense as the default, but you > can set the step size manually if desired. Yeah, but the HP uses decimal internally. It's just as easy for the user to specify the number of steps, and it has the advantage of not having the edge case problems. And you know how many points you'll get. > start/stop/step is also familiar for users of Excel and other spreadsheets' > Fill>Series command. Not sure I want to follow Excel's example for *anything*. > Numeric integration is an interesting case, because generally you want > multiple iterations, interpolating between the points previously seen until > you reach some desired level of accuracy. E.g.: > > #1: ?0.0, 0.5, 1.0 > #2: ?0.25, 0.75 > #3: ?0.125, 0.375, 0.625, 0.875 So double the number of steps each time. Seems simpler to me (manipulating ints instead of floats). > For integration, I would probably want both APIs. > > >> Maybe we can come up with a better API, and e.g. specify begin and end >> points and the number of subdivisions? > > Thanks to Mark Dickinson for suggesting using Fraction, I have this: > > http://code.activestate.com/recipes/577878-generate-equally-spaced-floats/ Nice one! -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Mon Sep 26 05:47:40 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 26 Sep 2011 13:47:40 +1000 Subject: [Python-Dev] [Python-checkins] cpython: Issue #12981: rewrite multiprocessing_{sendfd, recvfd} in Python. In-Reply-To: References: Message-ID: On Sun, Sep 25, 2011 at 4:04 AM, charles-francois.natali wrote: > +if not(sys.platform == 'win32' or (hasattr(socket, 'CMSG_LEN') and > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? hasattr(socket, 'SCM_RIGHTS'))): > ? ? raise ImportError('pickling of connections not supported') I'm pretty sure the functionality checks for CMSG_LEN and SCM_RIGHTS mean the platform check for Windows is now redundant. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From neologix at free.fr Mon Sep 26 08:48:06 2011 From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Mon, 26 Sep 2011 08:48:06 +0200 Subject: [Python-Dev] [Python-checkins] cpython: Issue #12981: rewrite multiprocessing_{sendfd, recvfd} in Python. In-Reply-To: References: Message-ID: > On Sun, Sep 25, 2011 at 4:04 AM, charles-francois.natali > wrote: >> +if not(sys.platform == 'win32' or (hasattr(socket, 'CMSG_LEN') and >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? hasattr(socket, 'SCM_RIGHTS'))): >> ? ? raise ImportError('pickling of connections not supported') > > I'm pretty sure the functionality checks for CMSG_LEN and SCM_RIGHTS > mean the platform check for Windows is now redundant. > I'm not sure I understand what you mean. FD passing is supported on Unix with sendmsg/SCM_RIGHTS, and on Windows using whatever Windows uses for that purpose (see http://hg.python.org/cpython/file/2b47f0146639/Lib/multiprocessing/reduction.py#l63). If we remove the check for Windows, an ImportError will be raised systematically, unless you suggest that Windows does support sendmsg/SCM_RIGHTS (I somehow doubt Windows supports Unix domain sockets, but I don't know Windows at all). cf From ncoghlan at gmail.com Mon Sep 26 15:21:08 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 26 Sep 2011 09:21:08 -0400 Subject: [Python-Dev] [Python-checkins] cpython: Issue #12981: rewrite multiprocessing_{sendfd, recvfd} in Python. In-Reply-To: References: Message-ID: 2011/9/26 Charles-Fran?ois Natali : > I'm not sure I understand what you mean. You actually understood what I meant, I was just wrong because I misread the conditional. Nothing to see here, please move along :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From guido at python.org Mon Sep 26 23:00:06 2011 From: guido at python.org (Guido van Rossum) Date: Mon, 26 Sep 2011 14:00:06 -0700 Subject: [Python-Dev] PEP 393 close to pronouncement Message-ID: Martin has asked me to pronounce on PEP 393, after he's updated it in response to various feedback (including mine :-). I'm currently looking very favorable on it, but I thought I'd give folks here one more chance to bring up showstoppers. So, if you have the time, please review PEP 393 and/or play with the code (the repo is linked from the PEP's References section now). Please limit your feedback to show-stopping issues; we're past the stage of bikeshedding here. It's Good Enough (TM) and we'll have to rest of the 3.3 release cycle to improve incrementally. But we need to get to the point where the code can be committed to the 3.3 branch. In a few days I'll pronounce. -- --Guido van Rossum (python.org/~guido) From fperez.net at gmail.com Mon Sep 26 23:06:44 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 26 Sep 2011 21:06:44 +0000 (UTC) Subject: [Python-Dev] range objects in 3.x References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> Message-ID: On Sat, 24 Sep 2011 08:13:11 -0700, Guido van Rossum wrote: > I expect that to implement a version worthy of the stdlib math module, > i.e. that computes values that are correct within 0.5ULP under all > circumstances (e.g. lots of steps, or an end point close to the end of > the floating point range) we'd need a numerical wizard like Mark > Dickinson or Tim Peters (retired). Or maybe we could just borrow numpy's > code. +1 to using the numpy api, having continuity of API between the two would be great (people work interactively with 'from numpy import *', so having the linspace() call continue to work identically would be a bonus). License-wise there shouldn't be major issues in using the numpy code, as numpy is all BSD. Hopefully if there are any, the numpy community can help out. And now that Mark Dickinson is at Enthought (http:// enthought.com/company/developers.php) where Travis Oliphant --numpy author-- works, I'm sure the process of ironing out any implementation/api quirks could be handled easily. Cheers, f From victor.stinner at haypocalc.com Tue Sep 27 00:19:02 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 27 Sep 2011 00:19:02 +0200 Subject: [Python-Dev] PEP 393 close to pronouncement In-Reply-To: References: Message-ID: <201109270019.02442.victor.stinner@haypocalc.com> Hi, Le lundi 26 septembre 2011 23:00:06, Guido van Rossum a ?crit : > So, if you have the time, please review PEP 393 and/or play with the > code (the repo is linked from the PEP's References section now). I played with the code. The full test suite pass on Linux, FreeBSD and Windows. On Windows, there is just one failure in test_configparser, I didn't investigate it yet. I like the new API: a classic loop on the string length, and a macro to read the nth character. The backward compatibility is fully transparent and is already well tested because some modules still use the legacy API. It's quite easy to move from the legacy API to the new API. It's just boring, but it's almost done in the core (unicodeobject.c, but also some modules like _io). Since the introduction of PyASCIIObject, the PEP 393 is really good in memory footprint, especially for ASCII-only strings. In Python, you manipulate a lot of ASCII strings. PEP === It's not clear what is deprecated. It would help to have a full list of the deprecated functions/macros. Sometimes Martin wrote PyUnicode_Ready, sometimes PyUnicode_READY. It's confusing. Typo: PyUnicode_FAST_READY => PyUnicode_READY. "PyUnicode_WRITE_CHAR" is not listed in the New API section. Typo in "PyUnicode_CONVERT_BYTES(from_type, tp_type, begin, end, to)": tp_type => to_type. "PyUnicode_Chr(ch)": Why introducing a new function? PyUnicode_FromOrdinal was not enough? "GDB Debugging Hooks" It's not done yet. "None of the functions in this PEP become part of the stable ABI (PEP 384)." Why? Some functions don't depend on the internal representation, like PyUnicode_Substring or PyUnicode_FindChar. Typo: "In order to port modules to the new API, try to eliminate the use of these API elements: ... PyUnicode_GET_LENGTH ..." PyUnicode_GET_LENGTH is part of the new API. I suppose that you mean PyUnicode_GET_SIZE. Victor From dmalcolm at redhat.com Tue Sep 27 02:03:49 2011 From: dmalcolm at redhat.com (David Malcolm) Date: Mon, 26 Sep 2011 20:03:49 -0400 Subject: [Python-Dev] PEP 393 close to pronouncement In-Reply-To: <201109270019.02442.victor.stinner@haypocalc.com> References: <201109270019.02442.victor.stinner@haypocalc.com> Message-ID: <1317081830.23847.6.camel@surprise> On Tue, 2011-09-27 at 00:19 +0200, Victor Stinner wrote: > Hi, > > Le lundi 26 septembre 2011 23:00:06, Guido van Rossum a ?crit : > > So, if you have the time, please review PEP 393 and/or play with the > > code (the repo is linked from the PEP's References section now). > > PEP > === > "GDB Debugging Hooks" It's not done yet. I can do these if need be, but IIRC you (Victor) said on #python-dev that you were already working on them. From steve at pearwood.info Tue Sep 27 03:25:48 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 27 Sep 2011 11:25:48 +1000 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> Message-ID: <4E81261C.6040200@pearwood.info> Fernando Perez wrote: > On Sat, 24 Sep 2011 08:13:11 -0700, Guido van Rossum wrote: > >> I expect that to implement a version worthy of the stdlib math module, >> i.e. that computes values that are correct within 0.5ULP under all >> circumstances (e.g. lots of steps, or an end point close to the end of >> the floating point range) we'd need a numerical wizard like Mark >> Dickinson or Tim Peters (retired). Or maybe we could just borrow numpy's >> code. > > +1 to using the numpy api, having continuity of API between the two would > be great (people work interactively with 'from numpy import *', so having > the linspace() call continue to work identically would be a bonus). The audience for numpy is a small minority of Python users, and they tend to be more sophisticated. I'm sure they can cope with two functions with different APIs While continuity of API might be a good thing, we shouldn't accept a poor API just for the sake of continuity. I have some criticisms of the linspace API. numpy.linspace(start, stop, num=50, endpoint=True, retstep=False) http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html * It returns a sequence, which is appropriate for numpy but in standard Python it should return an iterator or something like a range object. * Why does num have a default of 50? That seems to be an arbitrary choice. * It arbitrarily singles out the end point for special treatment. When integrating, it is just as common for the first point to be singular as the end point, and therefore needing to be excluded. * If you exclude the end point, the stepsize, and hence the values returned, change: >>> linspace(1, 2, 4) array([ 1. , 1.33333333, 1.66666667, 2. ]) >>> linspace(1, 2, 4, endpoint=False) array([ 1. , 1.25, 1.5 , 1.75]) This surprises me. I expect that excluding the end point will just exclude the end point, i.e. return one fewer point. That is, I expect num to count the number of subdivisions, not the number of points. * The retstep argument changes the return signature from => array to => (array, number). I think that's a pretty ugly thing to do. If linspace returned a special iterator object, the step size could be exposed as an attribute. * I'm not sure that start/end/count is a better API than start/step/count. * This one is pure bike-shedding: I don't like the name linspace. We've gone 20 years without a floating point range in Python. I think we should give people a bit of time to pay around with alternative APIs rather than just grab the first one that comes along. -- Steven From guido at python.org Tue Sep 27 03:44:07 2011 From: guido at python.org (Guido van Rossum) Date: Mon, 26 Sep 2011 18:44:07 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E81261C.6040200@pearwood.info> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> Message-ID: On Mon, Sep 26, 2011 at 6:25 PM, Steven D'Aprano wrote: > While continuity of API might be a good thing, we shouldn't accept a poor > API just for the sake of continuity. I have some criticisms of the linspace > API. [...] > * I'm not sure that start/end/count is a better API than start/step/count. On this particular one, I think start/end/count *is* better, because in the most common use case the start and end points are given, and the step is somewhat of an afterthought (e.g. how many integration steps, or how many points in the chart). I also keep thinking that numerically, if start and end are given exactly, we should be able to compute the intermediate points within 0.5ULP, whereas it would seem that given start and step our computation for end may be considerably off, if the count is high. Or, maybe what I'm trying to say is, if the user has start/end/count but the API wants start/step/count, after computing step = (end-start) / count, the value of start + count*step might not quite equal to end; whereas if the user has start/step/count but the API wants start/end/count I think there's nothing wrong with computing end = start + step*count. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Tue Sep 27 08:23:41 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 27 Sep 2011 19:23:41 +1300 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> Message-ID: <4E816BED.4000103@canterbury.ac.nz> Guido van Rossum wrote: > Or, maybe what I'm trying to say is, if the > user has start/end/count but the API wants start/step/count, after > computing step = (end-start) / count, the value of start + count*step > might not quite equal to end; whereas if the user has start/step/count > but the API wants start/end/count I think there's nothing wrong with > computing end = start + step*count. +1, that makes sense to me. And I don't like "linspace" either. Something more self explanatory such as "subdivide" or "interpolate" might be better. -- Greg From martin at v.loewis.de Tue Sep 27 08:40:16 2011 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 27 Sep 2011 08:40:16 +0200 Subject: [Python-Dev] PEP 393 close to pronouncement In-Reply-To: <1317081830.23847.6.camel@surprise> References: <201109270019.02442.victor.stinner@haypocalc.com> <1317081830.23847.6.camel@surprise> Message-ID: <4E816FD0.7040309@v.loewis.de> >> "GDB Debugging Hooks" It's not done yet. > I can do these if need be, but IIRC you (Victor) said on #python-dev > that you were already working on them. I already changed it for an earlier version of the PEP. It still needs to sort out the various compact representations. I could do them as well, so don't worry. Regards, Martin From victor.stinner at haypocalc.com Tue Sep 27 15:50:27 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 27 Sep 2011 15:50:27 +0200 Subject: [Python-Dev] PEP 393 close to pronouncement In-Reply-To: <201109270019.02442.victor.stinner@haypocalc.com> References: <201109270019.02442.victor.stinner@haypocalc.com> Message-ID: <201109271550.27837.victor.stinner@haypocalc.com> Le mardi 27 septembre 2011 00:19:02, Victor Stinner a ?crit : > On Windows, there is just one failure in test_configparser, I > didn't investigate it yet Oh, it was a real bug in io.IncrementalNewlineDecoder. It is now fixed. Victor From alexander.belopolsky at gmail.com Tue Sep 27 16:52:55 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 27 Sep 2011 10:52:55 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E81261C.6040200@pearwood.info> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> Message-ID: On Mon, Sep 26, 2011 at 9:25 PM, Steven D'Aprano wrote: .. > The audience for numpy is a small minority of Python users, and they tend to > be more sophisticated. I'm sure they can cope with two functions with > different APIs > > While continuity of API might be a good thing, we shouldn't accept a poor > API just for the sake of continuity. I have some criticisms of the linspace > API. +1 In addition to Steven's criticisms of numpy.linspace(), I would like a new function to work with types other than float. It certainly makes sense to have range-like functionality for fractions and decimal floats, but also I often find a need to generate a list of equally spaces dates or datetime points. It would be nice if a new function would allow start and stop to be any type that supports subtraction and whose differences support division by numbers. Also, in terms of implementation, I don't think we'll gain anything by copying numpy code because linspace(start, stop, num) is effectively just arange(0, num) * step + start where step is (stop-start)/(num-1). This works because numpy arrays (produced by arange()) support linear algebra and we are not going to copy that. From alexander.belopolsky at gmail.com Tue Sep 27 17:05:15 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 27 Sep 2011 11:05:15 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E816BED.4000103@canterbury.ac.nz> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> Message-ID: On Tue, Sep 27, 2011 at 2:23 AM, Greg Ewing wrote: .. > And I don't like "linspace" either. Something more self > explanatory such as "subdivide" or "interpolate" might > be better. "Grid" would be nice and short, but may suggest 2-dimentional result. Whatever word we choose, I think it should be a noun rather than a verb. ("Comb" (noun) brings up the right image, but is probably too informal and may be confused with a short for "combination.") From ethan at stoneleaf.us Tue Sep 27 17:24:16 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 27 Sep 2011 08:24:16 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> Message-ID: <4E81EAA0.5080507@stoneleaf.us> Alexander Belopolsky wrote: > On Tue, Sep 27, 2011 at 2:23 AM, Greg Ewing wrote: > .. >> And I don't like "linspace" either. Something more self >> explanatory such as "subdivide" or "interpolate" might >> be better. > > "Grid" would be nice and short, but may suggest 2-dimentional result. > Whatever word we choose, I think it should be a noun rather than a > verb. ("Comb" (noun) brings up the right image, but is probably too > informal and may be confused with a short for "combination.") segment? srange? ~Ethan~ From raymond.hettinger at gmail.com Tue Sep 27 17:44:56 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 27 Sep 2011 11:44:56 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E81EAA0.5080507@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> Message-ID: On Sep 27, 2011, at 11:24 AM, Ethan Furman wrote: > Alexander Belopolsky wrote: >> On Tue, Sep 27, 2011 at 2:23 AM, Greg Ewing wrote: >> .. >>> And I don't like "linspace" either. Something more self >>> explanatory such as "subdivide" or "interpolate" might >>> be better. >> "Grid" would be nice and short, but may suggest 2-dimentional result. >> Whatever word we choose, I think it should be a noun rather than a >> verb. ("Comb" (noun) brings up the right image, but is probably too >> informal and may be confused with a short for "combination.") > > segment? srange? In the math module, we used an f prefix to differentiate math.fsum() from the built-in sum() function. That suggests frange() as a possible name for a variant of range() that creates floats. That works reasonably well if the default argument pattern is the same as range: frange(10.0, 20.0, 0.5) There could be an optional argument to compute the interval: frange(10.0, 20.0, numpoints=20) And possibly a option to include both endpoints: frange(10.0, 20.0, 0.5, inclusive=True) Raymond From steve at pearwood.info Tue Sep 27 18:00:15 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 28 Sep 2011 02:00:15 +1000 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> Message-ID: <4E81F30F.6060704@pearwood.info> Alexander Belopolsky wrote: > On Tue, Sep 27, 2011 at 2:23 AM, Greg Ewing wrote: > .. >> And I don't like "linspace" either. Something more self >> explanatory such as "subdivide" or "interpolate" might >> be better. > > "Grid" would be nice and short, but may suggest 2-dimentional result. > Whatever word we choose, I think it should be a noun rather than a > verb. ("Comb" (noun) brings up the right image, but is probably too > informal and may be confused with a short for "combination.") I came up with "spread". Here's my second attempt, which offers both count/start/end and count/start/step APIs: http://code.activestate.com/recipes/577881-equally-spaced-floats-part-2/ -- Steven From ethan at stoneleaf.us Tue Sep 27 18:06:55 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 27 Sep 2011 09:06:55 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> Message-ID: <4E81F49F.9080309@stoneleaf.us> Raymond Hettinger wrote: > On Sep 27, 2011, at 11:24 AM, Ethan Furman wrote: > >> Alexander Belopolsky wrote: >>> On Tue, Sep 27, 2011 at 2:23 AM, Greg Ewing wrote: >>> .. >>>> And I don't like "linspace" either. Something more self >>>> explanatory such as "subdivide" or "interpolate" might >>>> be better. >>> "Grid" would be nice and short, but may suggest 2-dimentional result. >>> Whatever word we choose, I think it should be a noun rather than a >>> verb. ("Comb" (noun) brings up the right image, but is probably too >>> informal and may be confused with a short for "combination.") >> segment? srange? > > In the math module, we used an f prefix to differentiate math.fsum() from the built-in sum() function. That suggests frange() as a possible name for a variant of range() that creates floats. > > That works reasonably well if the default argument pattern is the same as range: frange(10.0, 20.0, 0.5) > > There could be an optional argument to compute the interval: frange(10.0, 20.0, numpoints=20) > > And possibly a option to include both endpoints: frange(10.0, 20.0, 0.5, inclusive=True) I like the numpoints option. I also like Alexander's idea of making this new range able to work with other types that support addition/division -- but in that case does the 'f' prefix still make sense? ~Ethan~ From alexander.belopolsky at gmail.com Tue Sep 27 18:16:17 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 27 Sep 2011 12:16:17 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 11:44 AM, Raymond Hettinger wrote: .. > In the math module, we used an f prefix to differentiate math.fsum() from the built-in sum() function. ?That suggests frange() as a possible name for a variant of range() that creates floats. > > That works reasonably well if the default argument pattern is the same as range: ? frange(10.0, 20.0, 0.5) +1 on adding frange() to math module or to the recently contemplated stats module. For something that aspires to becoming a builtin one day, I would like to see something not focused on floats exclusively and something with a proper English name. From steve at pearwood.info Tue Sep 27 18:20:27 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 28 Sep 2011 02:20:27 +1000 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> Message-ID: <4E81F7CB.7060001@pearwood.info> Alexander Belopolsky wrote: > In addition to Steven's criticisms of numpy.linspace(), I would like a > new function to work with types other than float. It certainly makes > sense to have range-like functionality for fractions and decimal > floats, but also I often find a need to generate a list of equally > spaces dates or datetime points. It would be nice if a new function > would allow start and stop to be any type that supports subtraction > and whose differences support division by numbers. I think a polymorphic numeric range function would be useful. If it happened to support dates, that would be great, but I think that a daterange() function in the datetime module would be more appropriate. Who is going to think to import math if you want a range of dates? -- Steven From ethan at stoneleaf.us Tue Sep 27 18:27:34 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 27 Sep 2011 09:27:34 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E81F7CB.7060001@pearwood.info> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E81F7CB.7060001@pearwood.info> Message-ID: <4E81F976.3070300@stoneleaf.us> Steven D'Aprano wrote: > Alexander Belopolsky wrote: > >> In addition to Steven's criticisms of numpy.linspace(), I would like a >> new function to work with types other than float. It certainly makes >> sense to have range-like functionality for fractions and decimal >> floats, but also I often find a need to generate a list of equally >> spaces dates or datetime points. It would be nice if a new function >> would allow start and stop to be any type that supports subtraction >> and whose differences support division by numbers. > > I think a polymorphic numeric range function would be useful. If it > happened to support dates, that would be great, but I think that a > daterange() function in the datetime module would be more appropriate. > Who is going to think to import math if you want a range of dates? If it's generic, why should it live in math? ~Ethan~ From alexander.belopolsky at gmail.com Tue Sep 27 18:36:52 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 27 Sep 2011 12:36:52 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E81F7CB.7060001@pearwood.info> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E81F7CB.7060001@pearwood.info> Message-ID: On Tue, Sep 27, 2011 at 12:20 PM, Steven D'Aprano wrote: > If it happened > to support dates, that would be great, but I think that a daterange() > function in the datetime module would be more appropriate. Or even more appropriately in the calendar module. The problem is that we may already have a similar function there and nobody knows about it. > Who is going to > think to import math if you want a range of dates? No one. That's why I said that if the new function ends up in math or stats, I am +1 on frange(). However, I did in the past try to give dates for start and stop and a timedelta for step expecting range() to work. This would be similar to the way sum works for non-numeric types when an appropriate start value is given. BTW, at the time when I worked on extending (x)range to long integers, I attempted to make it work on dates, but at that time timedelta did not support division by integer, so I refocused on that instead. From guido at python.org Tue Sep 27 19:03:38 2011 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Sep 2011 10:03:38 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 9:16 AM, Alexander Belopolsky wrote: > On Tue, Sep 27, 2011 at 11:44 AM, Raymond Hettinger > wrote: > .. >> In the math module, we used an f prefix to differentiate math.fsum() from the built-in sum() function. ?That suggests frange() as a possible name for a variant of range() that creates floats. >> >> That works reasonably well if the default argument pattern is the same as range: ? frange(10.0, 20.0, 0.5) > > +1 on adding frange() to math module or to the recently contemplated > stats module. ? For something that aspires to becoming a builtin one > day, I would like to see something not focused on floats exclusively > and something with a proper English name. Um, I think you better read the thread. :-) I successfully argued that mimicking the behavior of range() for floats is a bad idea, and that we need to come up with a name for an API that takes start/stop/count arguments instead of start/stop/step. -- --Guido van Rossum (python.org/~guido) From ubershmekel at gmail.com Tue Sep 27 19:07:44 2011 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Tue, 27 Sep 2011 13:07:44 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> Message-ID: I as well think the construct should support other types as it sounds an awful lot like the missing for(;;) loop construct. Concerning the api, if we use spread(start, step, count) we don't rely on a division method even though the caller probably does. Just mentioning another option. --Yuval Greenfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Sep 27 19:11:10 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 27 Sep 2011 13:11:10 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 1:03 PM, Guido van Rossum wrote: .. > Um, I think you better read the thread. :-) I successfully argued that > mimicking the behavior of range() for floats is a bad idea, and that > we need to come up with a name for an API that takes start/stop/count > arguments instead of start/stop/step. The name "frange" does not necessarily imply that we have to mimic the API completely. As long as frange(10.0) and frange(1.0, 10.0) works as expected while addressing floating point subtleties through optional arguments and documentation, I don't see why it can't be called frange() *and* support count. From guido at python.org Tue Sep 27 19:21:31 2011 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Sep 2011 10:21:31 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 10:11 AM, Alexander Belopolsky wrote: > On Tue, Sep 27, 2011 at 1:03 PM, Guido van Rossum wrote: > .. >> Um, I think you better read the thread. :-) I successfully argued that >> mimicking the behavior of range() for floats is a bad idea, and that >> we need to come up with a name for an API that takes start/stop/count >> arguments instead of start/stop/step. > > The name "frange" does not necessarily imply that we have to mimic the > API completely. ?As long as frange(10.0) and frange(1.0, 10.0) works > as expected while addressing floating point subtleties through > optional arguments and documentation, I don't see why it can't be > called frange() *and* support count. But I do. :-) Calling it frange() is pretty much *begging* people to assume that the 3rd parameter has the same meaning as for range(). Now, there are a few cases where that doesn't matter, e.g. frange(0, 100, 10) will do the expected thing under both interpretations, but frange(0, 100, 5) will not. -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Tue Sep 27 19:32:13 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 27 Sep 2011 10:32:13 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> Message-ID: <4E82089D.8070209@stoneleaf.us> Guido van Rossum wrote: > On Tue, Sep 27, 2011 at 10:11 AM, Alexander Belopolsky wrote: >> The name "frange" does not necessarily imply that we have to mimic the >> API completely. As long as frange(10.0) and frange(1.0, 10.0) works >> as expected while addressing floating point subtleties through >> optional arguments and documentation, I don't see why it can't be >> called frange() *and* support count. > > But I do. :-) Calling it frange() is pretty much *begging* people to > assume that the 3rd parameter has the same meaning as for range(). > Now, there are a few cases where that doesn't matter, e.g. frange(0, > 100, 10) will do the expected thing under both interpretations, but > frange(0, 100, 5) will not. What about the idea of this signature? frange([start], stop, step=None, count=None) Then when count is desired, it can be specified, and when step is sufficient, no change is necessary. ~Ethan~ From steve at pearwood.info Tue Sep 27 19:55:15 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 28 Sep 2011 03:55:15 +1000 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E82089D.8070209@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> Message-ID: <4E820E03.6090100@pearwood.info> Ethan Furman wrote: > What about the idea of this signature? > > frange([start], stop, step=None, count=None) > > Then when count is desired, it can be specified, and when step is > sufficient, no change is necessary. A default of start=0 makes sense for integer range, because the most common use for range *by far* is for counting, and in Python we count 0, 1, 2, ... Similarly, we usually count every item, so a default step of 1 is useful. But for numeric work, neither of those defaults are useful. This proposed spread/frange/whatever function will be used for generating a sequence of equally spaced numbers, and not for counting. A starting value of 0.0 is generally no more special than any other starting value. There is no good reason to single out default start=0. Likewise a step-size of 1.0 is also arbitrary. It isn't useful to hammer the square peg of numeric ranges into the round hole of integer counts. We should not try to force this float range to use the same API as builtin range. (In hindsight, it is a shame that range is called "range" instead of "count". itertools got the name right.) -- Steven From ethan at stoneleaf.us Tue Sep 27 20:20:25 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 27 Sep 2011 11:20:25 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E820E03.6090100@pearwood.info> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> Message-ID: <4E8213E9.1060705@stoneleaf.us> Steven D'Aprano wrote: > Ethan Furman wrote: > >> What about the idea of this signature? >> >> frange([start], stop, step=None, count=None) >> >> Then when count is desired, it can be specified, and when step is >> sufficient, no change is necessary. > > A default of start=0 makes sense for integer range, because the most > common use for range *by far* is for counting, and in Python we count 0, > 1, 2, ... Similarly, we usually count every item, so a default step of 1 > is useful. > > But for numeric work, neither of those defaults are useful. This > proposed spread/frange/whatever function will be used for generating a > sequence of equally spaced numbers, and not for counting. A starting > value of 0.0 is generally no more special than any other starting value. > There is no good reason to single out default start=0. Likewise a > step-size of 1.0 is also arbitrary. > > It isn't useful to hammer the square peg of numeric ranges into the > round hole of integer counts. We should not try to force this float > range to use the same API as builtin range. > > (In hindsight, it is a shame that range is called "range" instead of > "count". itertools got the name right.) Good points. So how about: some_name_here(start, stop, *, step=None, count=None) ? I personally would use the step value far more often than the count value. ~Ethan~ From guido at python.org Tue Sep 27 20:36:08 2011 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Sep 2011 11:36:08 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E8213E9.1060705@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 11:20 AM, Ethan Furman wrote: >I personally would use the step value far more often than the count > value. But that's exactly what we don't *want* you to do! Because (unless you are a numerical wizard) you probably aren't doing the error analysis needed to avoid the "unexpected extra point" problem due to floating point inaccuracies. For your own good, we want you to state the count and let us deliver the number of points you want. -- --Guido van Rossum (python.org/~guido) From alexander.belopolsky at gmail.com Tue Sep 27 20:38:48 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 27 Sep 2011 14:38:48 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E8213E9.1060705@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 2:20 PM, Ethan Furman wrote: .. > Good points. ?So how about: > > some_name_here(start, stop, *, step=None, count=None) > +1 The unusual optional first arguments is one of the things I dislike about range(). Shouldn't step default to 1.0? Also, when count is given, stop can be elided. This will make for a nice symmetry: between stop, step and count any two can be provided but stop+step may be problematic and we can warn about this choice in the docs. From alexander.belopolsky at gmail.com Tue Sep 27 20:48:51 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 27 Sep 2011 14:48:51 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 2:36 PM, Guido van Rossum wrote: .. > But that's exactly what we don't *want* you to do! Because (unless you > are a numerical wizard) you probably aren't doing the error analysis > needed to avoid the "unexpected extra point" problem due to floating > point inaccuracies. For your own good, we want you to state the count > and let us deliver the number of points you want. But the likely result will be that a non-wizard will find that range() does not work with floats, reach for some_name_here(), find the absence of step option, curse the developers, write count=int((stop-start)/step) and leave this with a nagging thought that (s)he forgot +/-1 somewhere. From guido at python.org Tue Sep 27 20:53:39 2011 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Sep 2011 11:53:39 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 11:48 AM, Alexander Belopolsky wrote: > On Tue, Sep 27, 2011 at 2:36 PM, Guido van Rossum wrote: > .. >> But that's exactly what we don't *want* you to do! Because (unless you >> are a numerical wizard) you probably aren't doing the error analysis >> needed to avoid the "unexpected extra point" problem due to floating >> point inaccuracies. For your own good, we want you to state the count >> and let us deliver the number of points you want. > > But the likely result will be that a non-wizard will find that range() > does not work with floats, reach for some_name_here(), find the > absence of step option, curse the developers, write > count=int((stop-start)/step) and leave this with a nagging thought > that (s)he forgot +/-1 somewhere. But the *user* can just force this to round by using int((stop-start+0.5)/step) or by using int(round()); either of these is an easy pattern to teach and learn and useful in many other places. The problem is that frange() cannot do that rounding for you, since its contract (if it is to be analogous to range() at all) is that there is no assumption that stop is anywhere close to start + a multiple of step. -- --Guido van Rossum (python.org/~guido) From wilfred at potatolondon.com Tue Sep 27 20:46:52 2011 From: wilfred at potatolondon.com (Wilfred Hughes) Date: Tue, 27 Sep 2011 19:46:52 +0100 Subject: [Python-Dev] unittest missing assertNotRaises Message-ID: Hi folks I wasn't sure if this warranted a bug in the tracker, so I thought I'd raise it here first. unittest has assertIn, assertNotIn, assertEqual, assertNotEqual and so on. So, it seems odd to me that there isn't assertNotRaises. Is there any particular motivation for not putting it in? I've attached a simple patch against Python 3's trunk to give an idea of what I have in mind. Thanks Wilfred -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: assert_not_raises.diff Type: text/x-patch Size: 925 bytes Desc: not available URL: From _ at lvh.cc Tue Sep 27 20:59:37 2011 From: _ at lvh.cc (Laurens Van Houtven) Date: Tue, 27 Sep 2011 20:59:37 +0200 Subject: [Python-Dev] unittest missing assertNotRaises In-Reply-To: References: Message-ID: Sure, you just *do* it. The only advantage I see in assertNotRaises is that when that exception is raised, you should (and would) get a failure, not an error. -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Tue Sep 27 21:05:32 2011 From: phd at phdru.name (Oleg Broytman) Date: Tue, 27 Sep 2011 23:05:32 +0400 Subject: [Python-Dev] unittest missing assertNotRaises In-Reply-To: References: Message-ID: <20110927190532.GA32171@iskra.aviel.ru> On Tue, Sep 27, 2011 at 07:46:52PM +0100, Wilfred Hughes wrote: > + def assertNotRaises(self, excClass, callableObj=None, *args, **kwargs): > + """Fail if an exception of class excClass is thrown by > + callableObj when invoked with arguments args and keyword > + arguments kwargs. > + > + """ > + try: > + callableObj(*args, **kwargs) > + except excClass: > + raise self.failureException("%s was raised" % excClass) > + > + What if I want to assert my test raises neither OSError nor IOError? Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ethan at stoneleaf.us Tue Sep 27 21:22:20 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 27 Sep 2011 12:22:20 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> Message-ID: <4E82226C.7080401@stoneleaf.us> Guido van Rossum wrote: > On Tue, Sep 27, 2011 at 11:20 AM, Ethan Furman wrote: >> I personally would use the step value far more often than the count >> value. > > But that's exactly what we don't *want* you to do! Because (unless you > are a numerical wizard) you probably aren't doing the error analysis > needed to avoid the "unexpected extra point" problem due to floating > point inaccuracies. For your own good, we want you to state the count > and let us deliver the number of points you want. Well, actually, I'd be using it with dates. ;) ~Ethan~ From tjreedy at udel.edu Tue Sep 27 21:43:26 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 27 Sep 2011 15:43:26 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> Message-ID: On 9/27/2011 1:03 PM, Guido van Rossum wrote: > mimicking the behavior of range() for floats is a bad idea, and that > we need to come up with a name for an API that takes start/stop/count > arguments instead of start/stop/step. [In the following, I use count as the number of intervals; the number of points is 1 more.] I agree with others that we should not just have a floatrange. An exact-as-possible floatrange is trivially based on exact computations with fractions: def floatrange(a, b, n): '''Yield floats a, b, and n-1 equally spaced floats in between.''' for num,dem in fracrange(a.as_integer_ratio(), b.as_integer_ratio(), n): yield num/dem There are good reasons to expose the latter. If fracrange is done with the Fraction class, each ratio will be reduced to lowest terms, which means that the denominator will vary for each pair. In some situations, one might prefer a constant denominator across the series. Once a constant denominator is calculated (eash though not trivial), fracrange is trivially based on range. The following makes the denominator as small as possible if the inputs are in lowest terms: def fracrange(frac1, frac2, n): '''Yield fractions frac1, frac2 and n-1 equally spaced fractions in between. Fractions are represented as (numerator, denominator > 0) pairs. For output, use the smallest common denominator of the inputs that makes the numerator range an even multiple of n. ''' n1, d1 = frac1 n2, d2 = frac2 dem = d1 * d2 // gcd(d1, d2) start = n1 * (dem // d1) stop = n2 * (dem // d2) rang = stop - start q, r = divmod(rang, n) if r: gcd_r_n = gcd(r, n) m = n // gcd_r_n dem *= m start *= m stop *= m step = rang // gcd_r_n # rang * m // n else: step = q # if r==0: gcd(r,n)==n, m==1, rang//n == q for num in range(start, stop+1, step): yield num,dem Two example uses: for i,j in fracrange((1,10), (22,10), 7): print(i,j) print() for i,j in fracrange((1,5), (1,1), 6): print(i,j) ## print 1 10 4 10 7 10 10 10 13 10 16 10 19 10 22 10 3 15 5 15 7 15 9 15 11 15 13 15 15 15 If nothing else, the above is easy to check for correctness ;-). Note that for fraction output, one will normally want to be able to enter an explicit pair such as (1,5) or even (2,10) The decimal equivalent, .2, after conversion to float, gets converted by .as_integer_ratio() back to (3602879701896397, 18014398509481984). -- Terry Jan Reedy From guido at python.org Tue Sep 27 21:52:01 2011 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Sep 2011 12:52:01 -0700 Subject: [Python-Dev] PEP 393 close to pronouncement In-Reply-To: <201109271550.27837.victor.stinner@haypocalc.com> References: <201109270019.02442.victor.stinner@haypocalc.com> <201109271550.27837.victor.stinner@haypocalc.com> Message-ID: Given the feedback so far, I am happy to pronounce PEP 393 as accepted. Martin, congratulations! Go ahead and mark ity as Accepted. (But please do fix up the small nits that Victor reported in his earlier message.) -- --Guido van Rossum (python.org/~guido) From alexander.belopolsky at gmail.com Tue Sep 27 22:05:01 2011 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 27 Sep 2011 16:05:01 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 2:53 PM, Guido van Rossum wrote: > On Tue, Sep 27, 2011 at 11:48 AM, Alexander Belopolsky > wrote: >> On Tue, Sep 27, 2011 at 2:36 PM, Guido van Rossum wrote: >> .. >>> But that's exactly what we don't *want* you to do! Because (unless you >>> are a numerical wizard) you probably aren't doing the error analysis >>> needed to avoid the "unexpected extra point" problem due to floating >>> point inaccuracies. For your own good, we want you to state the count >>> and let us deliver the number of points you want. I don't disagree that the ability to provide count= option is useful. I am just saying that there are also cases where float step is known exactly and count (or stop) can be deduced from stop (or count) without any floating point issues. Iteration over integers that happen to be represented by floats is one use case, but using integer range may be a better option in this case. In US it is still popular to measure things in power of two fractions. Simulating a carpenter's yard does not suffer from rounding when done in floats. Counting by .5 and .25 has its uses too. Maybe frange() should just signal the FP inexact exception if we expect users to need hand holding to such a degree. From tjreedy at udel.edu Tue Sep 27 22:06:22 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 27 Sep 2011 16:06:22 -0400 Subject: [Python-Dev] unittest missing assertNotRaises In-Reply-To: References: Message-ID: On 9/27/2011 2:46 PM, Wilfred Hughes wrote: > Hi folks > > I wasn't sure if this warranted a bug in the tracker, so I thought I'd > raise it here first. > > unittest has assertIn, assertNotIn, assertEqual, assertNotEqual and so These all test possible specification conditions and sensible test conditions. For instance -1 and particularly 3 should not be in range(3). Including 3 is a realistic possible error. If you partition a set into subsets < x and > x, x should not be in either, but an easy mistake would put it in either or both. > Is there any particular motivation for not putting it in? You have 'motivation' backwards. There are an infinity of things we could add. We need a positive, substantial reason with real use cases to add something. An expression should return a particular value or return a particular expression. If it returns a value, testing that it is the correct value eliminates all exceptions. And testing for an expected exception eliminates all others. If there is an occasional needed for the proposal, one can write the same code you did, but with the possibility of excluding more than one exception. So I do not see any need for the proposal. -- Terry Jan Reedy From ericsnowcurrently at gmail.com Tue Sep 27 22:12:52 2011 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 27 Sep 2011 14:12:52 -0600 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E8213E9.1060705@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 12:20 PM, Ethan Furman wrote: > Good points. ?So how about: > > some_name_here(start, stop, *, step=None, count=None) > > ? ?I personally would use the step value far more often than the count > value. Let's call it xrange() or maybe range_ex(). But seriously, here's an approach that extends the generic replacement idea a bit. I like the idea of the "some_name_here" function as a builtin in conjunction with Alexander's idea of a generic function, a la len() or repr(). Like those other builtin generic functions, it would leverage special methods (whether new or existing) to use the "range protocol" of objects. The builtin would either replace range() (and assume its name) or be a new builtin with a parallel name to range(). Either way, it would return an object of the new/refactored range type, which would reflect the above signature. If the new builtin were to rely on a new range-related protocol (i.e. if it were needed), that protocol could distinguish support for stepping from support for counting. Then floats could simply not support the stepping portion. And the fate of range()? As far as the existing builtin range() goes, either we would leave it alone, we would make range() a wrapper function around a new range type, or the new range type would completely replace the old. If we were to leave it alone, the new builtin would have a name that parallels the old name. Then we wouldn't have to worry about backward compatibility for performance, type, or signature. Going the wrapper function route would preserve backward compatibility for the function signature, but isinstance(obj, range) wouldn't work anymore. Whether leaving range() alone or making it a wrapper, we could replace it with the new builtin in Python 4, if it made sense (like happened with xrange). If we entirely replaced the current range() with the new (more generic) range type, the biggest concern is maintaining backward compatibility with the function signature, in both Python and the C-API. That would be tricky since the above signature seems incompatible with the current one. -eric > > ~Ethan~ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com > From guido at python.org Tue Sep 27 22:13:41 2011 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Sep 2011 13:13:41 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 1:05 PM, Alexander Belopolsky wrote: > On Tue, Sep 27, 2011 at 2:53 PM, Guido van Rossum wrote: >> On Tue, Sep 27, 2011 at 11:48 AM, Alexander Belopolsky >> wrote: >>> On Tue, Sep 27, 2011 at 2:36 PM, Guido van Rossum wrote: >>> .. >>>> But that's exactly what we don't *want* you to do! Because (unless you >>>> are a numerical wizard) you probably aren't doing the error analysis >>>> needed to avoid the "unexpected extra point" problem due to floating >>>> point inaccuracies. For your own good, we want you to state the count >>>> and let us deliver the number of points you want. > > I don't disagree that the ability to provide count= option is useful. > I am just saying that there are also cases where float step is known > exactly and count (or stop) can be deduced from stop (or count) > without any floating point issues. ?Iteration over integers that > happen to be represented by floats is one use case, but using integer > range may be a better option in this case. ?In US it is still popular > to measure things in power of two fractions. ?Simulating a carpenter's > yard does not suffer from rounding when done in floats. ?Counting by > .5 and .25 has its uses too. ?Maybe frange() should just signal the FP > inexact exception if we expect users to need hand holding to such a > degree. But why offer an API that is an attractive nuisance? I don't think that it is a burden to the user to have to specify "from 0 to 2 inches in 8 steps" instead of "from 0 to 2 inches in 1/4 inch steps". (And what if they tried to say "from 0 to 3 1/4 inches in 1/2 inch steps" ?) -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Sep 27 22:16:06 2011 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Sep 2011 13:16:06 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 1:12 PM, Eric Snow wrote: > On Tue, Sep 27, 2011 at 12:20 PM, Ethan Furman wrote: >> Good points. ?So how about: >> >> some_name_here(start, stop, *, step=None, count=None) >> >> ? ?I personally would use the step value far more often than the count >> value. > > Let's call it xrange() or maybe range_ex(). ? ?But seriously, > here's an approach that extends the generic replacement idea a bit. > > I like the idea of the "some_name_here" function as a builtin in > conjunction with Alexander's idea of a generic function, a la len() or > repr(). ?Like those other builtin generic functions, it would leverage > special methods (whether new or existing) to use the "range protocol" > of objects. > > The builtin would either replace range() (and assume its name) or be a > new builtin with a parallel name to range(). ?Either way, it would > return an object of the new/refactored range type, which would reflect > the above signature. > > If the new builtin were to rely on a new range-related protocol (i.e. > if it were needed), that protocol could distinguish support for > stepping from support for counting. ?Then floats could simply not > support the stepping portion. This sound like a rather over-designed API. > And the fate of range()? > > As far as the existing builtin range() goes, either we would leave it > alone, we would make range() a wrapper function around a new range > type, or the new range type would completely replace the old. ?If we > were to leave it alone, the new builtin would have a name that > parallels the old name. ?Then we wouldn't have to worry about backward > compatibility for performance, type, or signature. > > Going the wrapper function route would preserve backward compatibility > for the function signature, but ?isinstance(obj, range) wouldn't work > anymore. ?Whether leaving range() alone or making it a wrapper, we > could replace it with the new builtin in Python 4, if it made sense > (like happened with xrange). > > If we entirely replaced the current range() with the new (more > generic) range type, the biggest concern is maintaining backward > compatibility with the function signature, in both Python and the > C-API. ?That would be tricky since the above signature seems > incompatible with the current one. > > -eric > >> >> ~Ethan~ >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Tue Sep 27 22:21:52 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 27 Sep 2011 13:21:52 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> Message-ID: <4E823060.3070805@stoneleaf.us> Guido van Rossum wrote: > But why offer an API that is an attractive nuisance? I don't think > that it is a burden to the user to have to specify "from 0 to 2 inches > in 8 steps" instead of "from 0 to 2 inches in 1/4 inch steps". (And > what if they tried to say "from 0 to 3 1/4 inches in 1/2 inch steps" > ?) And how many steps in "from 37 3/4 inches to 90 1/4 inches" ? I don't want to have to calculate that. That's what computers are for. Your last example is no different than today's range(2, 10, 3) -- we don't get 10 or 9. ~Ethan~ From greg.ewing at canterbury.ac.nz Tue Sep 27 23:16:12 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 Sep 2011 10:16:12 +1300 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> Message-ID: <4E823D1C.3080102@canterbury.ac.nz> Alexander Belopolsky wrote: > I don't think we'll gain anything by > copying numpy code because linspace(start, stop, num) is effectively > just > > arange(0, num) * step + start I don't think the intention was to literally copy the code, but to investigate borrowing the algorithm, in case it was using some special technique to maximise numerical accuracy. But from this it seems like it's just using the naive algorithm that we've already decided is not the best. -- Greg From guido at python.org Tue Sep 27 23:16:49 2011 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Sep 2011 14:16:49 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E823060.3070805@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> <4E823060.3070805@stoneleaf.us> Message-ID: On Tue, Sep 27, 2011 at 1:21 PM, Ethan Furman wrote: > Guido van Rossum wrote: >> >> But why offer an API that is an attractive nuisance? I don't think >> that it is a burden to the user to have to specify "from 0 to 2 inches >> in 8 steps" instead of "from 0 to 2 inches in 1/4 inch steps". (And >> what if they tried to say "from 0 to 3 1/4 inches in 1/2 inch steps" >> ?) > > And how many steps in "from 37 3/4 inches to 90 1/4 inches" ? ?I don't want > to have to calculate that. ?That's what computers are for. That's just silly. The number of steps is (stop - start) / step. > Your last example is no different than today's range(2, 10, 3) -- we don't > get 10 or 9. The difference is that most operations on integers, by their nature, give give exact results, except for division (which is defined as producing a float in Python 3). Whether float operations give exact results or not is a lot harder to know, and the various IEEE states are hard to access. Just because the US measurement system happens to use only values that are exactly representable as floats doesn't mean floats are great to represent measurements. (What if you have to cut a length of string in three equal pieces?) -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Tue Sep 27 23:19:15 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 Sep 2011 10:19:15 +1300 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> Message-ID: <4E823DD3.1030000@canterbury.ac.nz> Alexander Belopolsky wrote: > ("Comb" (noun) brings up the right image, but is probably too > informal and may be confused with a short for "combination.") And also with "comb filter" for those who are into signal processing. -- Greg From greg.ewing at canterbury.ac.nz Tue Sep 27 23:39:18 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 28 Sep 2011 10:39:18 +1300 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E81F976.3070300@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E81F7CB.7060001@pearwood.info> <4E81F976.3070300@stoneleaf.us> Message-ID: <4E824286.9060000@canterbury.ac.nz> Ethan Furman wrote: > If it's generic, why should it live in math? Generic? Maybe that's it: grange() It's also an English word, unfortunately one with a completely unrelated meaning. :-( -- Greg From ethan at stoneleaf.us Tue Sep 27 23:39:43 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 27 Sep 2011 14:39:43 -0700 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> <4E823060.3070805@stoneleaf.us> Message-ID: <4E82429F.50105@stoneleaf.us> Guido van Rossum wrote: > On Tue, Sep 27, 2011 at 1:21 PM, Ethan Furman wrote: >> Guido van Rossum wrote: >>> But why offer an API that is an attractive nuisance? I don't think >>> that it is a burden to the user to have to specify "from 0 to 2 inches >>> in 8 steps" instead of "from 0 to 2 inches in 1/4 inch steps". (And >>> what if they tried to say "from 0 to 3 1/4 inches in 1/2 inch steps" >>> ?) >> And how many steps in "from 37 3/4 inches to 90 1/4 inches" ? I don't want >> to have to calculate that. That's what computers are for. > > That's just silly. The number of steps is (stop - start) / step. Not silly at all -- it begs for an api of (start, stop, step), not (start, stop, count). Personally, I have no problems with typing either 'step=...' or 'stop=...', but I think losing step as an option is a *ahem* step backwards. ~Ethan~ From tseaver at palladion.com Tue Sep 27 23:50:59 2011 From: tseaver at palladion.com (Tres Seaver) Date: Tue, 27 Sep 2011 17:50:59 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E824286.9060000@canterbury.ac.nz> References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E81F7CB.7060001@pearwood.info> <4E81F976.3070300@stoneleaf.us> <4E824286.9060000@canterbury.ac.nz> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/27/2011 05:39 PM, Greg Ewing wrote: > Ethan Furman wrote: > >> If it's generic, why should it live in math? > > Generic? Maybe that's it: grange() > > It's also an English word, unfortunately one with a completely > unrelated meaning. :-( One could always think of the Midwest US farm country, cut into even one-mile sections by dirt roads, and think of 'grange'. :) Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6CRUMACgkQ+gerLs4ltQ7EYgCgi/iJqg4Wq8LVF25kd6gS0yN/ MQ4An1kl/+8uBcFzAJPPNPL1iBqSNwJM =2IUq -----END PGP SIGNATURE----- From martin at v.loewis.de Wed Sep 28 00:56:58 2011 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 28 Sep 2011 00:56:58 +0200 Subject: [Python-Dev] PEP 393 memory savings update Message-ID: <4E8254BA.6010705@v.loewis.de> I have redone my memory benchmark, and added a few new counters. The application is a very small Django application. The same source code of the app and Django itself is used on all Python versions. The full list of results is at http://www.dcl.hpi.uni-potsdam.de/home/loewis/djmemprof/ Here are some excerpts: A. 32-bit builds, storage for Unicode objects 3.x, 32-bit wchar_t: 6378540 3.x, 16-bit wchar_t: 3694694 PEP 393: 2216807 Compared to the previous results, there are now some significant savings even compared to a narrow unicode build. B. 3.x, number of strings by maxchar: ASCII: 35713 (1,300,000 chars) Latin-1: 235 (11,000 chars) BMP: 260 (700 chars) other: 0 total: 36,000 (1,310,000 chars) This explains why the savings for shortening ASCII objects are significant in this application. I have no good intuition how this effect would show for "real" applications. It may be that the percentage of ASCII strings (in number and chars) grows proportionally with the total number of strings; it may also be that the majority of these strings is a certain fixed overhead (resulting from Python identifiers and other interned strings). C. String-ish objects in 2.7 and 3.3-trunk: 2.x 3.x #unicode 370 36,000 #bytes 43,000 14,000 #total 43,400 50,000 len(unicode) 5,300 1,306,000 len(bytes) 2,040,000 860,000 len(total) 2,046,000 2,200,000 (Note: the computations in the results are slightly messed up: the number of bytes for bytes objectts is actually the sum of the lengths, not the sum of the sizeofs; this gets added in the "total" lines to the sum of sizeofs of unicode strings, which is non-sensical. The table above corrects this) As you can see, Python 3 creates more string objects in total. D. Memory consumption for 2.x, 3.x, PEP 393, accounting both unicode and bytes objects, using 32-bit builds and 32-bit wchar_t: 2.x: 3,620,000 bytes 3.x: 7,750,000 bytes PEP 393: 3,340,000 bytes This suggests that PEP 393 actually reduces memory consumption below what 2.7 uses. This is offset though by "other" (non-string) objects, which take 300KB more in 3.x. Regards, Martin From brian.curtin at gmail.com Wed Sep 28 00:59:12 2011 From: brian.curtin at gmail.com (Brian Curtin) Date: Tue, 27 Sep 2011 17:59:12 -0500 Subject: [Python-Dev] PyCon 2012 Proposals Due October 12 Message-ID: The deadline for PyCon 2012 tutorial, talk, and poster proposals is under 15 days away, so be sure to get your submissions in by October 12, 2011. Whether you?re a first-timer or an experienced veteran, PyCon is depends on you, the community, coming together to build the best conference schedule possible. Our call for proposals (http://us.pycon.org/2012/cfp/) lays out the details it takes to be included in the lineup for the conference in Santa Clara, CA on March 7-15, 2012. If you?re unsure of what to write about, our recent survey yielded a large list of potential talk topics (http://pycon.blogspot.com/2011/09/need-talk-ideas.html), and plenty of ideas for tutorials (INSERT TUTORIAL POST). We?ve also come up with general tips on proposal writing at http://pycon.blogspot.com/2011/08/writing-good-proposal.html to ensure everyone has the most complete proposal when it comes time for review. As always, the program committee wants to put together an incredible conference, so they?ll be working with submitters to fine tune proposal details and help you produce the best submissions. We?ve had plenty of great news to share since we first announced the call for proposals. Paul Graham of Y Combinator was recently announced as a keynote speaker (http://pycon.blogspot.com/2011/09/announcing-first-pycon-2012-keynote.html), making his return after a 2003 keynote. David Beazley, famous for his mind-blowing talks on CPython?s Global Interpreter Lock, was added to the plenary talk series (http://pycon.blogspot.com/2011/09/announcing-first-pycon-2012-plenary.html). Sponsors can now list their job openings on the ?Job Fair? section of the PyCon site (http://pycon.blogspot.com/2011/09/announcing-pycon-2012-fair-page-sponsor.html). We?re hard at work to bring you the best conference yet, so stay tuned to PyCon news at http://pycon.blogspot.com/ and on Twitter at https://twitter.com/#!/pycon. We recently eclipsed last year?s sponsorship count of 40 and are currently at a record 52 organizations supporting PyCon. If you or your organization are interested in sponsoring PyCon, we?d love to hear from you, so check out our sponsorship page (http://us.pycon.org/2012/sponsors/). A quick thanks to all of our awesome PyCon 2012 Sponsors: - Diamond Level: Google and Dropbox. - Platinum Level: New Relic, SurveyMonkey, Microsoft, Eventbrite, Nasuni and Gondor.io - Gold Level: Walt Disney Animation Studios, CCP Games, Linode, Enthought, Canonical, Dotcloud, Loggly, Revsys, ZeOmega, Bitly, ActiveState, JetBrains, Caktus, Disqus, Spotify, Snoball, Evite, and PlaidCloud - Silver Level: Imaginary Landscape, WiserTogether, Net-ng, Olark, AG Interactive, Bitbucket, Open Bastion, 10Gen, gocept, Lex Machina, fwix, github, toast driven, Aarki, Threadless, Cox Media, myYearBook, Accense Technology, Wingware, FreshBooks, and BigDoor - Lanyard: Dreamhost - Sprints: Reddit - FLOSS: OSU/OSL, OpenHatch The PyCon Organizers - http://us.pycon.org/2012 Jesse Noller - Chairman - jnoller at python.org Brian Curtin - Publicity Coordinator - brian at python.org From guido at python.org Wed Sep 28 01:17:15 2011 From: guido at python.org (Guido van Rossum) Date: Tue, 27 Sep 2011 16:17:15 -0700 Subject: [Python-Dev] PEP 393 memory savings update In-Reply-To: <4E8254BA.6010705@v.loewis.de> References: <4E8254BA.6010705@v.loewis.de> Message-ID: Great news, Martin! On Tue, Sep 27, 2011 at 3:56 PM, "Martin v. L?wis" wrote: > I have redone my memory benchmark, and added a few new > counters. > > The application is a very small Django application. The same > source code of the app and Django itself is used on all Python > versions. The full list of results is at > > http://www.dcl.hpi.uni-potsdam.de/home/loewis/djmemprof/ > > Here are some excerpts: > > A. 32-bit builds, storage for Unicode objects > 3.x, 32-bit wchar_t: 6378540 > 3.x, 16-bit wchar_t: 3694694 > PEP 393: ? ? ? ? ? ? 2216807 > > Compared to the previous results, there are now some > significant savings even compared to a narrow unicode build. > > B. 3.x, number of strings by maxchar: > ASCII: ? 35713 (1,300,000 chars) > Latin-1: 235 ? (11,000 chars) > BMP: ? ? 260 ? (700 chars) > other: ? 0 > total: ? 36,000 (1,310,000 chars) > > This explains why the savings for shortening ASCII objects > are significant in this application. I have no good intuition > how this effect would show for "real" applications. It may be > that the percentage of ASCII strings (in number and chars) grows > proportionally with the total number of strings; it may also > be that the majority of these strings is a certain fixed overhead > (resulting from Python identifiers and other interned strings). > > C. String-ish objects in 2.7 and 3.3-trunk: > ? ? ? ? ? ? ? ? ? 2.x ? ? ? ? 3.x > #unicode ? ? ? ? ? 370 ? ? ?36,000 > #bytes ? ? ? ? ?43,000 ? ? ?14,000 > #total ? ? ? ? ?43,400 ? ? ?50,000 > > len(unicode) ? ? 5,300 ? 1,306,000 > len(bytes) ? 2,040,000 ? ? 860,000 > len(total) ? 2,046,000 ? 2,200,000 > > (Note: the computations in the results are slightly messed up: > the number of bytes for bytes objectts is actually the sum > of the lengths, not the sum of the sizeofs; this gets added > in the "total" lines to the sum of sizeofs of unicode strings, > which is non-sensical. The table above corrects this) > > As you can see, Python 3 creates more string objects in total. > > D. Memory consumption for 2.x, 3.x, PEP 393, accounting both > ? unicode and bytes objects, using 32-bit builds and 32-bit > ? wchar_t: > 2.x: ? ? 3,620,000 bytes > 3.x: ? ? 7,750,000 bytes > PEP 393: 3,340,000 bytes > > This suggests that PEP 393 actually reduces memory consumption > below what 2.7 uses. This is offset though by "other" (non-string) > objects, which take 300KB more in 3.x. > > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Wed Sep 28 01:43:13 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 28 Sep 2011 09:43:13 +1000 Subject: [Python-Dev] unittest missing assertNotRaises In-Reply-To: <20110927190532.GA32171@iskra.aviel.ru> References: <20110927190532.GA32171@iskra.aviel.ru> Message-ID: <4E825F91.9000701@pearwood.info> Oleg Broytman wrote: > On Tue, Sep 27, 2011 at 07:46:52PM +0100, Wilfred Hughes wrote: >> + def assertNotRaises(self, excClass, callableObj=None, *args, **kwargs): >> + """Fail if an exception of class excClass is thrown by >> + callableObj when invoked with arguments args and keyword >> + arguments kwargs. >> + >> + """ >> + try: >> + callableObj(*args, **kwargs) >> + except excClass: >> + raise self.failureException("%s was raised" % excClass) >> + >> + > > What if I want to assert my test raises neither OSError nor IOError? Passing (OSError, IOError) as excClass should do it. But I can't see this being a useful test. As written, exceptions are still treated as errors, except for excClass, which is treated as a test failure. I can't see the use-case for that. assertRaises is useful: "IOError is allowed, but any other exception is a bug." makes perfect sense. assertNotRaises doesn't seem sensible or useful to me: "IOError is a failed test, but any other exception is a bug." What's the point? When would you use that? -- Steven From ckaynor at zindagigames.com Wed Sep 28 01:58:47 2011 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Tue, 27 Sep 2011 16:58:47 -0700 Subject: [Python-Dev] unittest missing assertNotRaises In-Reply-To: <4E825F91.9000701@pearwood.info> References: <20110927190532.GA32171@iskra.aviel.ru> <4E825F91.9000701@pearwood.info> Message-ID: On Tue, Sep 27, 2011 at 4:43 PM, Steven D'Aprano wrote: > But I can't see this being a useful test. As written, exceptions are still treated as errors, except for excClass, which is treated as a test failure. I can't see the use-case for that. assertRaises is useful: > > "IOError is allowed, but any other exception is a bug." > > makes perfect sense. assertNotRaises doesn't seem sensible or useful to me: > > "IOError is a failed test, but any other exception is a bug." > > What's the point? When would you use that? > I've run across a few cases where this is the correct behavior. The most recent one that comes to mind is while testing some code which has specific silencing options: specifically, writing a main file and a backup file, where failure to write the backup is not an error, but failure to write the main is. As such, the test suite should have the following tests: - Failure to write the main should assert that the code raises the failure error. No error is a failure, any other error is an error, that error is a success. (it may also check that the backup was written) - Failure to write the backup should assert that the code does not raise the failure error. No error is a success, that error is a failure, any other error is a error. (it may also check that the main was written) - Both succeeding should assert that the files were actually written, and that no error was raised. Any other result is an error. Now, the difference between a Failure and an Error is more or less a mute point, however I would expect an Error to be any unexpected result, while a Failure is a predicted (either via forethought or prior tests) but incorrect result. From raymond.hettinger at gmail.com Wed Sep 28 02:28:49 2011 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 27 Sep 2011 20:28:49 -0400 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E82226C.7080401@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> <4E82226C.7080401@stoneleaf.us> Message-ID: <7E063D58-4591-42F3-A6B6-B977101D8241@gmail.com> On Sep 27, 2011, at 3:22 PM, Ethan Furman wrote: > Well, actually, I'd be using it with dates. ;) FWIW, an approach using itertools is pretty general but even it doesn't work for dates :-) >>> from itertools import count, takewhile >>> from decimal import Decimal >>> from fractions import Fraction >>> list(takewhile(lambda x: x<=10.0, count(0.0, 0.5))) [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0] >>> list(takewhile(lambda x: x<=Decimal(1), count(Decimal(0), Decimal('0.1')))) [Decimal('0'), Decimal('0.1'), Decimal('0.2'), Decimal('0.3'), Decimal('0.4'), Decimal('0.5'), Decimal('0.6'), Decimal('0.7'), Decimal('0.8'), Decimal('0.9'), Decimal('1.0')] >>> list(takewhile(lambda x: x<=Fraction(2), count(Fraction(0), Fraction(1,3)))) [Fraction(0, 1), Fraction(1, 3), Fraction(2, 3), Fraction(1, 1), Fraction(4, 3), Fraction(5, 3), Fraction(2, 1)] >>> from datetime import date, timedelta >>> list(takewhile(lambda x: x<=date(2011,12,31), count(date(2011,9,27), timedelta(days=7)))) Traceback (most recent call last): File "", line 1, in list(takewhile(lambda x: x<=date(2011,12,31), count(date(2011,9,27), timedelta(days=7)))) TypeError: a number is required Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From exarkun at twistedmatrix.com Wed Sep 28 02:36:10 2011 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 28 Sep 2011 00:36:10 -0000 Subject: [Python-Dev] unittest missing assertNotRaises In-Reply-To: References: <20110927190532.GA32171@iskra.aviel.ru> <4E825F91.9000701@pearwood.info> Message-ID: <20110928003610.2214.1934279766.divmod.xquotient.140@localhost.localdomain> On 27 Sep, 11:58 pm, ckaynor at zindagigames.com wrote: >On Tue, Sep 27, 2011 at 4:43 PM, Steven D'Aprano >wrote: >>But I can't see this being a useful test. As written, exceptions are >>still treated as errors, except for excClass, which is treated as a >>test failure. I can't see the use-case for that. assertRaises is >>useful: >> >>"IOError is allowed, but any other exception is a bug." >> >>makes perfect sense. assertNotRaises doesn't seem sensible or useful >>to me: >> >>"IOError is a failed test, but any other exception is a bug." >> >>What's the point? When would you use that? > >I've run across a few cases where this is the correct behavior. The >most recent one that comes to mind is while testing some code which >has specific silencing options: specifically, writing a main file and >a backup file, where failure to write the backup is not an error, but >failure to write the main is. As such, the test suite should have the >following tests: >- Failure to write the main should assert that the code raises the >failure error. No error is a failure, any other error is an error, >that error is a success. (it may also check that the backup was >written) This is assertRaises, not assertNotRaises. >- Failure to write the backup should assert that the code does not >raise the failure error. No error is a success, that error is a >failure, any other error is a error. (it may also check that the main >was written) This is calling the function and asserting something about the result. >- Both succeeding should assert that the files were actually written, >and that no error was raised. Any other result is an error. > >Now, the difference between a Failure and an Error is more or less a >mute point, however I would expect an Error to be any unexpected >result, while a Failure is a predicted (either via forethought or >prior tests) but incorrect result. assertNotRaises doesn't make anything possible that isn't possible now. It probably doesn't even make anything easier - but if it does, it's so obscure (and I've read and written thousands of tests for all kinds of libraries over the years) that it doesn't merit a dedicated helper in the unittest library. Jean-Paul From turnbull at sk.tsukuba.ac.jp Wed Sep 28 04:11:51 2011 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 28 Sep 2011 11:11:51 +0900 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E82226C.7080401@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> <4E82226C.7080401@stoneleaf.us> Message-ID: <87zkhph0uw.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > Well, actually, I'd be using it with dates. ;) Why are you representing dates with floats? (That's a rhetorical question, don't answer it.) This is the whole problem with this discussion. Guido is saying (and I think it's plausible though I don't have enough experience to be sure myself) that if you look at the various use cases for such functions, they're different enough that it's going to be hard to come up with a single API that is good, let alone optimal, for them all. Then people keep coming back with "but look at X, where this API is clearly very useful", for values of X restricted to "stuff they do". That's good module design; it's not a good idea for the language (including builtins). Remember, something like range (Python 3) or range (Python 2) was *really necessary*[1] to express in Python the same algorithm that the C construct 'for' does. I agree with Steven d' that count would have been a somewhat better name (at least in my dialect it is possible, though somewhat unusual, to say "count up from 10 to 20 by 3s"), but that doesn't become clear until you want to talk about polymorphic versions of the concept. Also, in statistics "range" refers to a much smaller set (ie, {min, max}) than it does in Python, not that I really care. As far as a name for a more general concept, perhaps "interval" would be an interesting choice (although in analysis it has a connotation of continuity that would be inappropriate for a discrete set of floats). Footnotes: [1] FSVO "necessary" that includes "let's not do arithmetic on the index variable inside the loop". From ubershmekel at gmail.com Wed Sep 28 07:06:36 2011 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Wed, 28 Sep 2011 01:06:36 -0400 Subject: [Python-Dev] unittest missing assertNotRaises In-Reply-To: <20110928003610.2214.1934279766.divmod.xquotient.140@localhost.localdomain> References: <20110927190532.GA32171@iskra.aviel.ru> <4E825F91.9000701@pearwood.info> <20110928003610.2214.1934279766.divmod.xquotient.140@localhost.localdomain> Message-ID: On Sep 27, 2011 5:56 PM, wrote: > > > assertNotRaises doesn't make anything possible that isn't possible now. It probably doesn't even make anything easier - but if it does, it's so obscure (and I've read and written thousands of tests for all kinds of libraries over the years) that it doesn't merit a dedicated helper in the unittest library. > > Jean-Paul > +1 for keeping it simple. TOOWTDI. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Wed Sep 28 08:51:52 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 28 Sep 2011 08:51:52 +0200 Subject: [Python-Dev] cpython: Implement PEP 393. In-Reply-To: References: Message-ID: Am 28.09.2011 08:35, schrieb martin.v.loewis: > http://hg.python.org/cpython/rev/8beaa9a37387 > changeset: 72475:8beaa9a37387 > user: Martin v. L?wis > date: Wed Sep 28 07:41:54 2011 +0200 > summary: > Implement PEP 393. > [...] > > diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst > --- a/Doc/c-api/unicode.rst > +++ b/Doc/c-api/unicode.rst > @@ -1072,6 +1072,15 @@ > occurred and an exception has been set. > > > +.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, Py_ssize_t start, Py_ssize_t end, int direction) > + > + Return the first position of the character *ch* in ``str[start:end]`` using > + the given *direction* (*direction* == 1 means to do a forward search, > + *direction* == -1 a backward search). The return value is the index of the > + first match; a value of ``-1`` indicates that no match was found, and ``-2`` > + indicates that an error occurred and an exception has been set. > + > + > .. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end) > > Return the number of non-overlapping occurrences of *substr* in This is the only doc change for this change (and it doesn't have a versionadded). Surely there must be more new APIs and changes that need documenting? Georg From martin at v.loewis.de Wed Sep 28 09:46:34 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 28 Sep 2011 09:46:34 +0200 Subject: [Python-Dev] cpython: Implement PEP 393. In-Reply-To: References: Message-ID: <4E82D0DA.50405@v.loewis.de> > Surely there must be more new APIs and changes that need documenting? Correct. All documentation still needs to be written. Regards, Martin From martin at v.loewis.de Wed Sep 28 09:48:32 2011 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 28 Sep 2011 09:48:32 +0200 Subject: [Python-Dev] PEP 393 merged Message-ID: <4E82D150.7050204@v.loewis.de> I have now merged the PEP 393 implementation into default. The main missing piece is the documentation; contributions are welcome. Regards, Martin From wilfred at potatolondon.com Wed Sep 28 12:20:54 2011 From: wilfred at potatolondon.com (Wilfred Hughes) Date: Wed, 28 Sep 2011 11:20:54 +0100 Subject: [Python-Dev] unittest missing assertNotRaises In-Reply-To: References: Message-ID: On 27 September 2011 19:59, Laurens Van Houtven <_ at lvh.cc> wrote: > Sure, you just *do* it. The only advantage I see in assertNotRaises is that when that exception is raised, you should (and would) get a failure, not an error. It's a useful distinction.?I have found myself writing code of the form: def test_old_exception_no_longer_raised(self): try: do_something(): except OldException: self.assertTrue(False) in order to distinguish between a regression and something new erroring. The limitation of this pattern is that the test failure message is not as good. From phd at phdru.name Wed Sep 28 12:51:00 2011 From: phd at phdru.name (Oleg Broytman) Date: Wed, 28 Sep 2011 14:51:00 +0400 Subject: [Python-Dev] unittest missing assertNotRaises In-Reply-To: <4E825F91.9000701@pearwood.info> References: <20110927190532.GA32171@iskra.aviel.ru> <4E825F91.9000701@pearwood.info> Message-ID: <20110928105100.GB22828@iskra.aviel.ru> On Wed, Sep 28, 2011 at 09:43:13AM +1000, Steven D'Aprano wrote: > Oleg Broytman wrote: > >On Tue, Sep 27, 2011 at 07:46:52PM +0100, Wilfred Hughes wrote: > >>+ def assertNotRaises(self, excClass, callableObj=None, *args, **kwargs): > >>+ """Fail if an exception of class excClass is thrown by > >>+ callableObj when invoked with arguments args and keyword > >>+ arguments kwargs. > >>+ + """ > >>+ try: > >>+ callableObj(*args, **kwargs) > >>+ except excClass: > >>+ raise self.failureException("%s was raised" % excClass) > >>+ + > But I can't see this being a useful test. Me too. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From fuzzyman at voidspace.org.uk Wed Sep 28 13:04:06 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 28 Sep 2011 12:04:06 +0100 Subject: [Python-Dev] unittest missing assertNotRaises In-Reply-To: References: Message-ID: <4E82FF26.6060002@voidspace.org.uk> On 27/09/2011 19:46, Wilfred Hughes wrote: > Hi folks > > I wasn't sure if this warranted a bug in the tracker, so I thought I'd > raise it here first. > > unittest has assertIn, assertNotIn, assertEqual, assertNotEqual and so > on. So, it seems odd to me that there isn't assertNotRaises. Is there > any particular motivation for not putting it in? > > I've attached a simple patch against Python 3's trunk to give an idea > of what I have in mind. > As others have said, the opposite of assertRaises is just calling the code! I have several times needed regression tests that call code that *used* to raise an exception. It can look slightly odd to have a test without an assert, but the singular uselessness of assertNotRaises does not make it a better alternative. I usually add a comment: def test_something_that_used_to_not_work(self): # this used to raise an exception do_something() All the best, Michael Foord > Thanks > Wilfred > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Wed Sep 28 13:05:13 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 28 Sep 2011 12:05:13 +0100 Subject: [Python-Dev] unittest missing assertNotRaises In-Reply-To: References: Message-ID: <4E82FF69.5010208@voidspace.org.uk> On 27/09/2011 19:59, Laurens Van Houtven wrote: > Sure, you just *do* it. The only advantage I see in assertNotRaises is > that when that exception is raised, you should (and would) get a > failure, not an error. There are some who don't see the distinction between a failure and an error as a useful distinction... I'm becoming more sympathetic to that view. All the best, Michael > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Sep 28 13:06:34 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 29 Sep 2011 00:06:34 +1300 Subject: [Python-Dev] range objects in 3.x In-Reply-To: <4E82226C.7080401@stoneleaf.us> References: <4E7CCA42.2060100@stoneleaf.us> <4E81261C.6040200@pearwood.info> <4E816BED.4000103@canterbury.ac.nz> <4E81EAA0.5080507@stoneleaf.us> <4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info> <4E8213E9.1060705@stoneleaf.us> <4E82226C.7080401@stoneleaf.us> Message-ID: <4E82FFBA.209@canterbury.ac.nz> Ethan Furman wrote: > Well, actually, I'd be using it with dates. ;) Seems to me that one size isn't going to fit all. Maybe we really want two functions: interpolate(start, end, count) Requires a type supporting addition and division, designed to work predictably and accurately with floats extrapolate(start, step, end) Works for any type supporting addition, not recommended for floats -- Greg From martin at v.loewis.de Wed Sep 28 13:24:22 2011 From: martin at v.loewis.de (martin at v.loewis.de) Date: Wed, 28 Sep 2011 13:24:22 +0200 Subject: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393 Message-ID: <20110928132422.Horde.OvQBCtjz9kROgwPm5ZwktiA@webmail.df.eu> The gcc that Apple ships with the Lion SDK (not sure what Xcode version that is) miscompiles Python now. I've reported this to Apple as bug 10143715; not sure whether there is a public link to this bug report. In essence, the code typedef struct { long length; long hash; int state; int *wstr; } PyASCIIObject; typedef struct { PyASCIIObject _base; long utf8_length; char *utf8; long wstr_length; } PyCompactUnicodeObject; void *_PyUnicode_compact_data(void *unicode) { return ((((PyASCIIObject*)unicode)->state & 0x20) ? ((void*)((PyASCIIObject*)(unicode) + 1)) : ((void*)((PyCompactUnicodeObject*)(unicode) + 1))); } miscompiles (with -O2 -fomit-frame-pointer) to __PyUnicode_compact_data: Leh_func_begin1: leaq 32(%rdi), %rax ret The compiler version is gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00) This unconditionally assumes that sizeof(PyASCIIObject) needs to be added to unicode, independent of whether the state bit is set or not. I'm not aware of a work-around in the code. My work-around is to use gcc-4.0, which is still available on my system from an earlier Xcode installation (in /Developer-3.2.6) Regards, Martin From catch-all at masklinn.net Wed Sep 28 13:45:14 2011 From: catch-all at masklinn.net (Xavier Morel) Date: Wed, 28 Sep 2011 13:45:14 +0200 Subject: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393 In-Reply-To: <20110928132422.Horde.OvQBCtjz9kROgwPm5ZwktiA@webmail.df.eu> References: <20110928132422.Horde.OvQBCtjz9kROgwPm5ZwktiA@webmail.df.eu> Message-ID: On 2011-09-28, at 13:24 , martin at v.loewis.de wrote: > The gcc that Apple ships with the Lion SDK (not sure what Xcode version that is) Xcode 4.1 > I'm not aware of a work-around in the code. My work-around is to use gcc-4.0, > which is still available on my system from an earlier Xcode installation > (in /Developer-3.2.6) Does Clang also fail to compile this? Clang was updated from 1.6 to 2.0 with Xcode 4, worth a try. Also, from your version listing it seems to be llvm-gcc (gcc frontend with llvm backend I think), is there no more straight gcc (with gcc frontend and backend)? FWIW, on 10.6 the default gcc is a straight 4.2 > gcc --version i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664) There is an llvm-gcc 4.2 but it uses a slightly different revision of llvm > llvm-gcc --version i686-apple-darwin10-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2333.4) From _ at lvh.cc Wed Sep 28 16:59:12 2011 From: _ at lvh.cc (Laurens Van Houtven) Date: Wed, 28 Sep 2011 16:59:12 +0200 Subject: [Python-Dev] unittest missing assertNotRaises In-Reply-To: <4E82FF69.5010208@voidspace.org.uk> References: <4E82FF69.5010208@voidspace.org.uk> Message-ID: Oops, I accidentally hit Reply instead of Reply to All... On Wed, Sep 28, 2011 at 1:05 PM, Michael Foord wrote: > On 27/09/2011 19:59, Laurens Van Houtven wrote: > > Sure, you just *do* it. The only advantage I see in assertNotRaises is that > when that exception is raised, you should (and would) get a failure, not an > error. > > There are some who don't see the distinction between a failure and an error > as a useful distinction... I'm becoming more sympathetic to that view. > I agree. Maybe if there were less failures posing as errors and errors posing as failures, I'd consider taking the distinction seriously. The only use case I've personally encountered is with fuzzy tests. The example that comes to mind is one where we had a fairly complex iterative algorithm for learning things from huge amounts of test data and there were certain criteria (goodness of result, time taken) that had to be satisfied. In that case, "it blew up because someone messed up dependencies" and "it took 3% longer than is allowable" are pretty obviously different... Considering how exotic that use case is, like I said, I'm not really convinced how generally useful it is :) especially since this isn't even a unit test... > All the best, > > Michael > cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Sep 28 17:41:48 2011 From: guido at python.org (Guido van Rossum) Date: Wed, 28 Sep 2011 08:41:48 -0700 Subject: [Python-Dev] PEP 393 merged In-Reply-To: <4E82D150.7050204@v.loewis.de> References: <4E82D150.7050204@v.loewis.de> Message-ID: Congrats! Python 3.3 will be better because of this. On Wed, Sep 28, 2011 at 12:48 AM, "Martin v. L?wis" wrote: > I have now merged the PEP 393 implementation into default. > The main missing piece is the documentation; contributions are > welcome. -- --Guido van Rossum (python.org/~guido) From mal at egenix.com Wed Sep 28 18:44:23 2011 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 28 Sep 2011 18:44:23 +0200 Subject: [Python-Dev] PEP 393 close to pronouncement In-Reply-To: References: <201109270019.02442.victor.stinner@haypocalc.com> <201109271550.27837.victor.stinner@haypocalc.com> Message-ID: <4E834EE7.4050706@egenix.com> Guido van Rossum wrote: > Given the feedback so far, I am happy to pronounce PEP 393 as > accepted. Martin, congratulations! Go ahead and mark ity as Accepted. > (But please do fix up the small nits that Victor reported in his > earlier message.) I've been working on feedback for the last few days, but I guess it's too late. Here goes anyway... I've only read the PEP and not followed the discussion due to lack of time, so if any of this is no longer valid, that's probably because the PEP wasn't updated :-) Resizing -------- Codecs use resizing a lot. Given that PyCompactUnicodeObject does not support resizing, most decoders will have to use PyUnicodeObject and thus not benefit from the memory footprint advantages of e.g. PyASCIIObject. Data structure -------------- The data structure description in the PEP appears to be wrong: PyASCIIObject has a wchar_t *wstr pointer - I guess this should be a char *str pointer, otherwise, where's the memory footprint advantage (esp. on Linux where sizeof(wchar_t) == 4) ? I also don't see a reason to limit the UCS1 storage version to ASCII. Accordingly, the object should be called PyLatin1Object or PyUCS1Object. Here's the version from the PEP: """ typedef struct { PyObject_HEAD Py_ssize_t length; Py_hash_t hash; struct { unsigned int interned:2; unsigned int kind:2; unsigned int compact:1; unsigned int ascii:1; unsigned int ready:1; } state; wchar_t *wstr; } PyASCIIObject; typedef struct { PyASCIIObject _base; Py_ssize_t utf8_length; char *utf8; Py_ssize_t wstr_length; } PyCompactUnicodeObject; """ Typedef'ing Py_UNICODE to wchar_t and using wchar_t in existing code will cause problems on some systems where whcar_t is a signed type. Python assumes that Py_UNICODE is unsigned and thus doesn't check for negative values or takes these into account when doing range checks or code point arithmetic. On such platform where wchar_t is signed, it is safer to typedef Py_UNICODE to unsigned wchar_t. Accordingly and to prevent further breakage, Py_UNICODE should not be deprecated and used instead of wchar_t throughout the code. Length information ------------------ Py_UNICODE access to the objects assumes that len(obj) == length of the Py_UNICODE buffer. The PEP suggests that length should not take surrogates into account on UCS2 platforms such as Windows. The causes len(obj) to not match len(wstr). As a result, Py_UNICODE access to the Unicode objects breaks when surrogate code points are present in the Unicode object on UCS2 platforms. The PEP also does not explain how lone surrogates will be handled with respect to the length information. Furthermore, determining len(obj) will require a loop over the data, checking for surrogate code points. A simple memcpy() is no longer enough. I suggest to drop the idea of having len(obj) not count wstr surrogate code points to maintain backwards compatibility and allow for working with lone surrogates. Note that the whole surrogate debate does not have much to do with this PEP, since it's mainly about memory footprint savings. I'd also urge to do a reality check with respect to surrogates and non-BMP code points: in practice you only very rarely see any non-BMP code points in your data. Making all Python users pay for the needs of a tiny fraction is not really fair. Remember: practicality beats purity. API --- Victor already described the needed changes. Performance ----------- The PEP only lists a few low-level benchmarks as basis for the performance decrease. I'm missing some more adequate real-life tests, e.g. using an application framework such as Django (to the extent this is possible with Python3) or a server like the Radicale calendar server (which is available for Python3). I'd also like to see a performance comparison which specifically uses the existing Unicode APIs to create and work with Unicode objects. Most extensions will use this way of working with the Unicode API, either because they want to support Python 2 and 3, or because the effort it takes to port to the new APIs is too high. The PEP makes some statements that this is slower, but doesn't quantify those statements. Memory savings -------------- The table only lists string sizes up 8 code points. The memory savings for these are really only significant for ASCII strings on 64-bit platforms, if you use the default UCS2 Python build as basis. For larger strings, I expect the savings to be more significant. OTOH, a single non-BMP code point in such a string would cause the savings to drop significantly again. Complexity ---------- In order to benefit from the new API, any code that has to deal with low-level Py_UNICODE access to the Unicode objects will have to be adapted. For best performance, each algorithm will have to be implemented for all three storage types. Not doing so, will result in a slow-down, if I read the PEP correctly. It's difficult to say, of what scale, since that information is not given in the PEP, but the added loop over the complete data array in order to determine the maximum code point value suggests that it is significant. Summary ------- I am not convinced that the memory savings are big enough to warrant the performance penalty and added complexity suggested by the PEP. In times where even smartphones come with multiple GB of RAM, performance is more important than memory savings. In practice, using a UCS2 build of Python usually is a good compromise between memory savings, performance and standards compatibility. For the few cases where you have to deal with UCS4 code points, we already have made good progress to make handling these much easier. IMHO, Python should be optimized for UCS2 usage, not the rare cases of UCS4 usage you find in practice. I do see the advantage for large strings, though. My personal conclusion ---------------------- Given that I've been working on and maintaining the Python Unicode implementation actively or by providing assistance for almost 12 years now, I've also thought about whether it's still worth the effort. My interests have shifted somewhat into other directions and I feel that helping Python reach world domination in other ways makes me happier than fighting over Unicode standards, implementations, special cases that aren't special enough, and all those other nitty-gritty details that cause long discussions :-) So I feel that the PEP 393 change is a good time to draw a line and leave Unicode maintenance to Ezio, Victor, Martin, and all the others that have helped over the years. I know it's in good hands. So here it is: ---------------------------------------------------------------- Hey, that was easy :-) PS: I'll stick around a bit more for the platform module, pybench and whatever else comes along where you might be interested in my input. Thanks and cheers, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 28 2011) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2011-10-04: PyCon DE 2011, Leipzig, Germany 6 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From benjamin at python.org Wed Sep 28 19:15:24 2011 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 28 Sep 2011 13:15:24 -0400 Subject: [Python-Dev] PEP 393 close to pronouncement In-Reply-To: <4E834EE7.4050706@egenix.com> References: <201109270019.02442.victor.stinner@haypocalc.com> <201109271550.27837.victor.stinner@haypocalc.com> <4E834EE7.4050706@egenix.com> Message-ID: 2011/9/28 M.-A. Lemburg : > Guido van Rossum wrote: >> Given the feedback so far, I am happy to pronounce PEP 393 as >> accepted. Martin, congratulations! Go ahead and mark ity as Accepted. >> (But please do fix up the small nits that Victor reported in his >> earlier message.) > > I've been working on feedback for the last few days, but I guess it's > too late. Here goes anyway... > > I've only read the PEP and not followed the discussion due to lack of > time, so if any of this is no longer valid, that's probably because > the PEP wasn't updated :-) > > Resizing > -------- > > Codecs use resizing a lot. Given that PyCompactUnicodeObject > does not support resizing, most decoders will have to use > PyUnicodeObject and thus not benefit from the memory footprint > advantages of e.g. PyASCIIObject. > > > Data structure > -------------- > > The data structure description in the PEP appears to be wrong: > > PyASCIIObject has a wchar_t *wstr pointer - I guess this should > be a char *str pointer, otherwise, where's the memory footprint > advantage (esp. on Linux where sizeof(wchar_t) == 4) ? > > I also don't see a reason to limit the UCS1 storage version > to ASCII. Accordingly, the object should be called PyLatin1Object > or PyUCS1Object. I think the purpose is that if it's only ASCII, no work is need to encode to UTF-8. -- Regards, Benjamin From martin at v.loewis.de Wed Sep 28 19:47:22 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 28 Sep 2011 19:47:22 +0200 Subject: [Python-Dev] PEP 393 close to pronouncement In-Reply-To: <4E834EE7.4050706@egenix.com> References: <201109270019.02442.victor.stinner@haypocalc.com> <201109271550.27837.victor.stinner@haypocalc.com> <4E834EE7.4050706@egenix.com> Message-ID: <4E835DAA.8020308@v.loewis.de> > Codecs use resizing a lot. Given that PyCompactUnicodeObject > does not support resizing, most decoders will have to use > PyUnicodeObject and thus not benefit from the memory footprint > advantages of e.g. PyASCIIObject. No, codecs have been rewritten to not use resizing. > PyASCIIObject has a wchar_t *wstr pointer - I guess this should > be a char *str pointer, otherwise, where's the memory footprint > advantage (esp. on Linux where sizeof(wchar_t) == 4) ? That's the Py_UNICODE representation for backwards compatibility. It's normally NULL. > I also don't see a reason to limit the UCS1 storage version > to ASCII. Accordingly, the object should be called PyLatin1Object > or PyUCS1Object. No, in the ASCII case, the UTF-8 length can be shared with the regular string length - not so for Latin-1 character above 127. > Typedef'ing Py_UNICODE to wchar_t and using wchar_t in existing > code will cause problems on some systems where whcar_t is a > signed type. > > Python assumes that Py_UNICODE is unsigned and thus doesn't > check for negative values or takes these into account when > doing range checks or code point arithmetic. > > On such platform where wchar_t is signed, it is safer to > typedef Py_UNICODE to unsigned wchar_t. No. Py_UNICODE values *must* be in the range 0..17*2**16. Values larger than 17*2**16 are just as bad as negative values, so having Py_UNICODE unsigned doesn't improve anything. > Py_UNICODE access to the objects assumes that len(obj) == > length of the Py_UNICODE buffer. The PEP suggests that length > should not take surrogates into account on UCS2 platforms > such as Windows. The causes len(obj) to not match len(wstr). Correct. > As a result, Py_UNICODE access to the Unicode objects breaks > when surrogate code points are present in the Unicode object > on UCS2 platforms. Incorrect. What specifically do you think would break? > The PEP also does not explain how lone surrogates will be > handled with respect to the length information. Just as any other code point. Python does not special-case surrogate code points anymore. > Furthermore, determining len(obj) will require a loop over > the data, checking for surrogate code points. A simple memcpy() > is no longer enough. No, it won't. The length of the Unicode object is stored in the length field. > I suggest to drop the idea of having len(obj) not count > wstr surrogate code points to maintain backwards compatibility > and allow for working with lone surrogates. Backwards-compatibility is fully preserved by PyUnicode_GET_SIZE returning the size of the Py_UNICODE buffer. PyUnicode_GET_LENGTH returns the true length of the Unicode object. > Note that the whole surrogate debate does not have much to > do with this PEP, since it's mainly about memory footprint > savings. I'd also urge to do a reality check with respect > to surrogates and non-BMP code points: in practice you only > very rarely see any non-BMP code points in your data. Making > all Python users pay for the needs of a tiny fraction is > not really fair. Remember: practicality beats purity. That's the whole point of the PEP. You only pay for what you actually need, and in most cases, it's ASCII. > For best performance, each algorithm will have to be implemented > for all three storage types. This will be a trade-off. I think most developers will be happy with a single version covering all three cases, especially as it's much more maintainable. Kind regards, Martin From martin at v.loewis.de Wed Sep 28 19:49:16 2011 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 28 Sep 2011 19:49:16 +0200 Subject: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393 In-Reply-To: References: <20110928132422.Horde.OvQBCtjz9kROgwPm5ZwktiA@webmail.df.eu> Message-ID: <4E835E1C.8090700@v.loewis.de> > Does Clang also fail to compile this? Clang was updated from 1.6 to 2.0 with Xcode 4, worth a try. clang indeed works fine. > Also, from your version listing it seems to be llvm-gcc (gcc frontend with llvm backend I think), > is there no more straight gcc (with gcc frontend and backend)? /usr/bin/cc and /usr/bin/gcc both link to llvm-gcc-4.2. However, there still is /usr/bin/gcc-4.2. Using that, Python also compiles correctly - so I have changed the gcc link on my system. Thanks for the advise - I didn't expect that Apple ships thhree compilers... Regards, Martin From catch-all at masklinn.net Wed Sep 28 19:56:45 2011 From: catch-all at masklinn.net (Xavier Morel) Date: Wed, 28 Sep 2011 19:56:45 +0200 Subject: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393 In-Reply-To: <4E835E1C.8090700@v.loewis.de> References: <20110928132422.Horde.OvQBCtjz9kROgwPm5ZwktiA@webmail.df.eu> <4E835E1C.8090700@v.loewis.de> Message-ID: <74F6ADFA-874D-4BAC-B304-CE8B12D80126@masklinn.net> On 2011-09-28, at 19:49 , Martin v. L?wis wrote: > > Thanks for the advise - I didn't expect that Apple ships thhree compilers? Yeah I can understand that, they're in the middle of the transition but Clang is not quite there yet so... From yasar11732 at gmail.com Wed Sep 28 21:00:50 2011 From: yasar11732 at gmail.com (=?ISO-8859-9?Q?Ya=FEar_Arabac=FD?=) Date: Wed, 28 Sep 2011 22:00:50 +0300 Subject: [Python-Dev] What it takes to change a single keyword. Message-ID: Hi, First of all, I am sincerely sorry if this is wrong mailing list to ask this question. I checked out definitions of couple other mailing list, and this one seemed most suitable. Here is my question: Let's say I want to change a single keyword, let's say import keyword, to be spelled as something else, like it's translation to my language. I guess it would be more complicated than modifiying Grammar/Grammar, but I can't be sure which files should get edited. I'am asking this, because, I am trying to figure out if I could translate keyword's into another language, without affecting behaviour of language. -- http://yasar.serveblog.net/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Wed Sep 28 22:55:27 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 28 Sep 2011 20:55:27 +0000 (UTC) Subject: [Python-Dev] range objects in 3.x References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> Message-ID: On Tue, 27 Sep 2011 11:25:48 +1000, Steven D'Aprano wrote: > The audience for numpy is a small minority of Python users, and they Certainly, though I'd like to mention that scientific computing is a major success story for Python, so hopefully it's a minority with something to contribute > tend to be more sophisticated. I'm sure they can cope with two functions > with different APIs No problem with having different APIs, but in that case I'd hope the builtin wouldnt' be named linspace, to avoid confusion. In numpy/scipy we try hard to avoid collisions with existing builtin names, hopefully in this case we can prevent the reverse by having a dialogue. > While continuity of API might be a good thing, we shouldn't accept a > poor API just for the sake of continuity. I have some criticisms of the > linspace API. > > numpy.linspace(start, stop, num=50, endpoint=True, retstep=False) > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html > > * It returns a sequence, which is appropriate for numpy but in standard > Python it should return an iterator or something like a range object. Sure, no problem there. > * Why does num have a default of 50? That seems to be an arbitrary > choice. Yup. linspace was modeled after matlab's identically named command: http://www.mathworks.com/help/techdoc/ref/linspace.html but I have no idea why the author went with 50 instead of 100 as the default (not that 100 is any better, just that it was matlab's choice). Given how linspace is often used for plotting, 100 is arguably a more sensible choice to get reasonable graphs on normal-resolution displays at typical sizes, absent adaptive plotting algorithms. > * It arbitrarily singles out the end point for special treatment. When > integrating, it is just as common for the first point to be singular as > the end point, and therefore needing to be excluded. Numerical integration is *not* the focus of linspace(): in numerical integration, if an end point is singular you have an improper integral and *must* approach the singularity much more carefully than by simply dropping the last point and hoping for the best. Whether you can get away by using (desired_end_point - very_small_number) --the dumb, naive approach-- or not depends a lot on the nature of the singularity. Since numerical integration is a complex and specialized domain and the subject of an entire subcomponent of the (much bigger than numpy) scipy library, there's no point in arguing the linspace API based on numerical integration considerations. Now, I *suspect* (but don't remember for sure) that the option to have it right-hand-open-ended was to match the mental model people have for range: In [5]: linspace(0, 10, 10, endpoint=False) Out[5]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) In [6]: range(0, 10) Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] I'm not arguing this was necessarily a good idea, just my theory on how it came to be. Perhaps R. Kern or one of the numpy lurkers in here will pitch in with a better recollection. > * If you exclude the end point, the stepsize, and hence the values > returned, change: > > >>> linspace(1, 2, 4) > array([ 1. , 1.33333333, 1.66666667, 2. ]) > >>> linspace(1, 2, 4, endpoint=False) > array([ 1. , 1.25, 1.5 , 1.75]) > > This surprises me. I expect that excluding the end point will just > exclude the end point, i.e. return one fewer point. That is, I expect > num to count the number of subdivisions, not the number of points. I find it very natural. It's important to remember that *the whole point* of linspace's existence is to provide arrays with a known, fixed number of points: In [17]: npts = 10 In [18]: len(linspace(0, 5, npts)) Out[18]: 10 In [19]: len(linspace(0, 5, npts, endpoint=False)) Out[19]: 10 So the invariant to preserve is *precisely* the number of points, not the step size. As Guido has pointed out several times, the value of this function is precisely to steer people *away* from thinking of step sizes in a context where they are more likely than not going to get it wrong. So linspace focuses on a guaranteed number of points, and lets the step size chips fall where they may. > * The retstep argument changes the return signature from => array to => > (array, number). I think that's a pretty ugly thing to do. If linspace > returned a special iterator object, the step size could be exposed as an > attribute. Yup, it's not pretty but understandable in numpy's context, a library that has a very strong design focus around arrays, and numpy arrays don't have writable attributes: In [20]: a = linspace(0, 10) In [21]: a.stepsize = 0.1 --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /home/fperez/ in () ----> 1 a.stepsize = 0.1 AttributeError: 'numpy.ndarray' object has no attribute 'stepsize' So while not the most elegant solution (and I agree that with a different return object a different approach can be taken), I think it's a practical compromise that works well for numpy. > * I'm not sure that start/end/count is a better API than > start/step/count. Guido has argued this point quite well, I think, but let me add that many years of experience and millions of lines of numerical code beg to differ. start/end/count is *precisely* the right api for this problem, and exposing step directly is very much the wrong thing to do here. I should add that numpy does provide an 'arange' function that does match the built-in range() api, but returns an array instead of a list/ iterator. This function does happen to allow for floating-point steps, but does come with the following warning about them in its docstring: Docstring: arange([start,] stop[, step,], dtype=None, maskna=False) Return evenly spaced values within a given interval. Values are generated within the half-open interval ``[start, stop)`` (in other words, the interval including `start` but excluding `stop`). For integer arguments the function is equivalent to the Python built-in `range `_ function, but returns a ndarray rather than a list. When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use ``linspace`` for these cases. # END docstring > * This one is pure bike-shedding: I don't like the name linspace. Sure, in numpy's case it was chosen purely to make existing matlab users more comfortable, I think. I don't particularly like it either (I don't come from a matlab background myself), FWIW. I do hope, though, that the chosen name is *not*: - 'interval'. An interval in mathematics has a strong notion of only endpoints, containing all elements between its endpoints in the underlying ordered set. - 'interpolate' or similar: numerical interpolation is a whole 'nother topic and I think this name would be more likely to confuse people expecting function interpolation than anything. But thanks for looking into this, and I do hope that feedback from the numpy/scipy users and accumulated experience is useful. Cheers, f From nad at acm.org Thu Sep 29 00:29:00 2011 From: nad at acm.org (Ned Deily) Date: Wed, 28 Sep 2011 15:29:00 -0700 Subject: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393 References: <20110928132422.Horde.OvQBCtjz9kROgwPm5ZwktiA@webmail.df.eu> <4E835E1C.8090700@v.loewis.de> <74F6ADFA-874D-4BAC-B304-CE8B12D80126@masklinn.net> Message-ID: In article <74F6ADFA-874D-4BAC-B304-CE8B12D80126 at masklinn.net>, Xavier Morel wrote: > On 2011-09-28, at 19:49 , Martin v. L?wis wrote: > > > > Thanks for the advise - I didn't expect that Apple ships thhree compilers? > Yeah I can understand that, they're in the middle of the transition but Clang > is not quite there yet so... BTW, at the moment, we are still using gcc-4.2 (not gcc-llvm nor clang) from Xcode 3 on OS X 10.6 for the 64-bit/32-bit installer builds and gcc-4.0 on 10.5 for the 32-bit-only installer builds. We will probably revisit that as we get closer to 3.3 alphas and betas. -- Ned Deily, nad at acm.org From greg.ewing at canterbury.ac.nz Thu Sep 29 00:36:21 2011 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 29 Sep 2011 11:36:21 +1300 Subject: [Python-Dev] range objects in 3.x In-Reply-To: References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> Message-ID: <4E83A165.6080406@canterbury.ac.nz> Fernando Perez wrote: > Now, I *suspect* (but don't remember for sure) that the option to have it > right-hand-open-ended was to match the mental model people have for range: > > In [5]: linspace(0, 10, 10, endpoint=False) > Out[5]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) > > In [6]: range(0, 10) > Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] My guess would be it's so that you can concatenate two sequences created with linspace covering adjacent ranges and get the same result as a single linspace call covering the whole range. > I do hope, though, that the chosen name is *not*: > > - 'interval' > > - 'interpolate' or similar Would 'subdivide' be acceptable? -- Greg From eric at trueblade.com Thu Sep 29 01:21:48 2011 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 28 Sep 2011 19:21:48 -0400 Subject: [Python-Dev] [Python-checkins] cpython: Implement PEP 393. In-Reply-To: References: Message-ID: <4E83AC0C.2010006@trueblade.com> Is there some reason str.format had such major surgery done to it? It appears parts of it were removed from stringlib. I had not even thought to look at the code before it was merged, as it never occurred to me anyone would do that. I left it in stringlib even in 3.x because there's the occasional talk of adding bytes.bformat, and since all of the code works well with stringlib (since it was used by str and unicode in 2.x), it made sense to leave it there. In addition, there are outstanding patches that are now broken. I'd prefer it return to how it used to be, and just the minimum changes required for PEP 393 be made to it. Thanks. Eric. On 9/28/2011 2:35 AM, martin.v.loewis wrote: > http://hg.python.org/cpython/rev/8beaa9a37387 > changeset: 72475:8beaa9a37387 > user: Martin v. L?wis > date: Wed Sep 28 07:41:54 2011 +0200 > summary: > Implement PEP 393. > > files: > Doc/c-api/unicode.rst | 9 + > Include/Python.h | 5 + > Include/complexobject.h | 5 +- > Include/floatobject.h | 5 +- > Include/longobject.h | 6 +- > Include/pyerrors.h | 6 + > Include/pyport.h | 3 + > Include/unicodeobject.h | 783 +- > Lib/json/decoder.py | 3 +- > Lib/test/json_tests/test_scanstring.py | 11 +- > Lib/test/test_codeccallbacks.py | 7 +- > Lib/test/test_codecs.py | 4 + > Lib/test/test_peepholer.py | 4 - > Lib/test/test_re.py | 7 + > Lib/test/test_sys.py | 38 +- > Lib/test/test_unicode.py | 41 +- > Makefile.pre.in | 6 +- > Misc/NEWS | 2 + > Modules/_codecsmodule.c | 8 +- > Modules/_csv.c | 2 +- > Modules/_ctypes/_ctypes.c | 6 +- > Modules/_ctypes/callproc.c | 8 - > Modules/_ctypes/cfield.c | 64 +- > Modules/_cursesmodule.c | 7 +- > Modules/_datetimemodule.c | 13 +- > Modules/_dbmmodule.c | 12 +- > Modules/_elementtree.c | 31 +- > Modules/_io/_iomodule.h | 2 +- > Modules/_io/stringio.c | 69 +- > Modules/_io/textio.c | 352 +- > Modules/_json.c | 252 +- > Modules/_pickle.c | 4 +- > Modules/_sqlite/connection.c | 19 +- > Modules/_sre.c | 382 +- > Modules/_testcapimodule.c | 2 +- > Modules/_tkinter.c | 70 +- > Modules/arraymodule.c | 8 +- > Modules/md5module.c | 10 +- > Modules/operator.c | 27 +- > Modules/pyexpat.c | 11 +- > Modules/sha1module.c | 10 +- > Modules/sha256module.c | 10 +- > Modules/sha512module.c | 10 +- > Modules/sre.h | 4 +- > Modules/syslogmodule.c | 14 +- > Modules/unicodedata.c | 28 +- > Modules/zipimport.c | 141 +- > Objects/abstract.c | 4 +- > Objects/bytearrayobject.c | 147 +- > Objects/bytesobject.c | 127 +- > Objects/codeobject.c | 15 +- > Objects/complexobject.c | 19 +- > Objects/dictobject.c | 20 +- > Objects/exceptions.c | 26 +- > Objects/fileobject.c | 17 +- > Objects/floatobject.c | 19 +- > Objects/longobject.c | 84 +- > Objects/moduleobject.c | 9 +- > Objects/object.c | 10 +- > Objects/setobject.c | 40 +- > Objects/stringlib/count.h | 9 +- > Objects/stringlib/eq.h | 23 +- > Objects/stringlib/fastsearch.h | 4 +- > Objects/stringlib/find.h | 31 +- > Objects/stringlib/formatter.h | 1516 -- > Objects/stringlib/localeutil.h | 27 +- > Objects/stringlib/partition.h | 12 +- > Objects/stringlib/split.h | 26 +- > Objects/stringlib/string_format.h | 1385 -- > Objects/stringlib/stringdefs.h | 2 + > Objects/stringlib/ucs1lib.h | 35 + > Objects/stringlib/ucs2lib.h | 34 + > Objects/stringlib/ucs4lib.h | 34 + > Objects/stringlib/undef.h | 10 + > Objects/stringlib/unicode_format.h | 1416 ++ > Objects/stringlib/unicodedefs.h | 2 + > Objects/typeobject.c | 18 +- > Objects/unicodeobject.c | 6112 ++++++++--- > Objects/uniops.h | 91 + > PC/_subprocess.c | 61 +- > PC/import_nt.c | 2 +- > PC/msvcrtmodule.c | 8 +- > PC/pyconfig.h | 4 - > PC/winreg.c | 8 +- > Parser/tokenizer.c | 6 +- > Python/_warnings.c | 16 +- > Python/ast.c | 61 +- > Python/bltinmodule.c | 26 +- > Python/ceval.c | 17 +- > Python/codecs.c | 44 +- > Python/compile.c | 89 +- > Python/errors.c | 4 +- > Python/formatter_unicode.c | 1445 ++- > Python/getargs.c | 46 +- > Python/import.c | 347 +- > Python/marshal.c | 4 +- > Python/peephole.c | 18 - > Python/symtable.c | 8 +- > Python/traceback.c | 59 +- > Tools/gdb/libpython.py | 27 +- > configure | 65 +- > configure.in | 46 +- > pyconfig.h.in | 6 - From benjamin at python.org Thu Sep 29 02:07:02 2011 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 28 Sep 2011 20:07:02 -0400 Subject: [Python-Dev] [Python-checkins] cpython: Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array In-Reply-To: References: Message-ID: 2011/9/28 victor.stinner : > http://hg.python.org/cpython/rev/36fc514de7f0 > changeset: ? 72512:36fc514de7f0 > user: ? ? ? ?Victor Stinner > date: ? ? ? ?Thu Sep 29 01:12:24 2011 +0200 > summary: > ?Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array > > Move other various macros to pymcacro.h > > Thanks Rusty Russell for having written these amazing C macros! > > files: > ?Include/Python.h ? ? ? ? ?| ?19 +-------- > ?Include/pymacro.h ? ? ? ? | ?57 +++++++++++++++++++++++++++ Do we really need a new file? Why not pyport.h where other compiler stuff goes? -- Regards, Benjamin From victor.stinner at haypocalc.com Thu Sep 29 02:27:48 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 29 Sep 2011 02:27:48 +0200 Subject: [Python-Dev] PEP 393 close to pronouncement In-Reply-To: <4E834EE7.4050706@egenix.com> References: <4E834EE7.4050706@egenix.com> Message-ID: <201109290227.48340.victor.stinner@haypocalc.com> > Resizing > -------- > > Codecs use resizing a lot. Given that PyCompactUnicodeObject > does not support resizing, most decoders will have to use > PyUnicodeObject and thus not benefit from the memory footprint > advantages of e.g. PyASCIIObject. Wrong. Even if you create a string using the legacy API (e.g. PyUnicode_FromUnicode), the string will be quickly compacted to use the most efficient memory storage (depending on the maximum character). "quickly": at the first call to PyUnicode_READY. Python tries to make all strings ready as early as possible. > PyASCIIObject has a wchar_t *wstr pointer - I guess this should > be a char *str pointer, otherwise, where's the memory footprint > advantage (esp. on Linux where sizeof(wchar_t) == 4) ? For pure ASCII strings, you don't have to store a pointer to the UTF-8 string, nor the length of the UTF-8 string (in bytes), nor the length of the wchar_t string (in wide characters): the length is always the length of the "ASCII" string, and the UTF-8 string is shared with the ASCII string. The structure is much smaller thanks to these optimizations, and so Python 3.3 uses less memory than 2.7 for ASCII strings, even for short strings. > I also don't see a reason to limit the UCS1 storage version > to ASCII. Accordingly, the object should be called PyLatin1Object > or PyUCS1Object. Latin1 is less interesting, you cannot share length/data fields with utf8 or wstr. We didn't add a special case for Latin1 strings (except using Py_UCS1* strings to store their characters). > Furthermore, determining len(obj) will require a loop over > the data, checking for surrogate code points. A simple memcpy() > is no longer enough. Wrong. len(obj) gives the "right" result (see the long discussion about what is the length of a string in a previous thread...) in O(1) since it's computed when the string is created. > ... in practice you only > very rarely see any non-BMP code points in your data. Making > all Python users pay for the needs of a tiny fraction is > not really fair. Remember: practicality beats purity. The creation of the string is maybe is little bit slower (especially when you have to scan the string twice to first get the maximum character), but I think that this slow down is smaller than the speedup allowed by the PEP. Because ASCII strings are now char*, I think that processing ASCII strings is faster because the CPU can cache more data (close to the CPU). We can do better optimization on ASCII and Latin1 strings (it's faster to manipulate char* than uint16_t* or uint32_t*). For example, str.center(), str.ljust, str.rjust and str.zfill do now use the very fast memset() function for latin1 strings to pad the string. Another example, duplicating a string (or create a substring) should be faster just because you have less data to copy (e.g. 10 bytes for a string of 10 Latin1 characters vs 20 or 40 bytes with Python 3.2). The two most common encodings in the world are ASCII and UTF-8. With the PEP 393, encoding to ASCII or UTF-8 is free, you don't have to encode anything, you have directly the encoded char* buffer (whereas you have to convert 16/32 bit wchar_t to char* in Python 3.2, even for pure ASCII). (It's also free to encode "Latin1" Unicode string to Latin1.) With the PEP 393, we never have to decode UTF-16 anymore when iterating on code pointer to support correctly non-BMP characters (which was required before in narrow build, e.g. on Windows). Iterate on code point is just a dummy loop, no need to check if each character is in range U+D800-U+DFFF. There are other funny tricks (optimizations). For example, text.replace(a, b) knows that there is nothing to do if maxchar(a) > maxchar(text), where maxchar(obj) just requires to read an attribute of the string. Think about ASCII and non-ASCII strings: pure_ascii.replace('\xe9', '') now just creates a new reference... I don't think that Martin wrote his PEP to be able to implement all these optimisations, but there are an interesting side effect of his PEP :-) > The table only lists string sizes up 8 code points. The memory > savings for these are really only significant for ASCII > strings on 64-bit platforms, if you use the default UCS2 > Python build as basis. In the 32 different cases, the PEP 393 is better in 29 cases and "just" as good as Python 3.2 in 3 corner cases: - 1 ASCII, 16-bit wchar, 32-bit - 1 Latin1, 32-bit wchar, 32-bit - 2 Latin1, 32-bit wchar, 32-bit Do you really care of these corner cases? See the more the realistic benchmark in previous Martin's email ("PEP 393 memory savings update"): the PEP 393 not only uses 3x less memory than 3.2, but it uses also *less* memory than Python 2.7, whereas Python 3 uses Unicode for everything! > For larger strings, I expect the savings to be more significant. Sure. > OTOH, a single non-BMP code point in such a string would cause > the savings to drop significantly again. In this case, it's just as good as Python 3.2 in wide mode, but worse than 3.2 in narrow mode. But is it a real use case? If you want a really efficient storage for heterogeneous strings (mixing ASCII, Latin1, BMP and non-BMP), you can split the text into chunks. For example, I hope that a text processor like LibreOffice doesn't store all paragraphs in the same string, but create at least a string per paragraph. If you use short chunks, you will not notice the difference in memory footprint when you insert a non-BMP character. The trick doesn't work on Python < 3.3. > For best performance, each algorithm will have to be implemented > for all three storage types. ... Good performances can be archived using PyUnicode macros like PyUnicode_READ and PyUnicode_WRITE. But yes, if you want a super-fast Unicode processor, you can special case some kinds (UCS1, UCS2, UCS4), like the examples I described before (use memset for latin1). > ... Not doing so, will result in a slow-down, if I read the PEP > correctly. I don't think so. Browse the new unicodeobject.c, there are few switch/case on the kind (if you ignore the low-level functions like _PyUnicode_Ready). For example, unicode_isalpha() has only one implementation, using PyUnicode_READ. PyUnicode_READ doesn't use a switch but classic (fast) arithmetic on pointers. > It's difficult to say, of what scale, since that > information is not given in the PEP, but the added loop over > the complete data array in order to determine the maximum > code point value suggests that it is significant. Feel free to run yourself Antoine's benchmarks like stringbench and iobench, they do micro-benchmarks. But you have to know that very few codecs use the new Unicode API (I think that only UTF-8 encoder and decoder use the new API, maybe also the ASCII codec). > I am not convinced that the memory savings are big enough > to warrant the performance penalty and added complexity > suggested by the PEP. I didn't run any benchmark, but I don't think that the PEP 393 makes Python slower. I expect a minor speedup in some corner cases :-) I prefer to wait until all modules are converted to the new API to run benchmarks. TODO: unicodedata, _csv, all codecs (especially error handlers), ... > In practice, using a UCS2 build of Python usually is a good > compromise between memory savings, performance and standards > compatibility About "standards compatibility", the work to support non-BMP characters everywhere was not finished in Python 3.2, 11 years after the introduction of Unicode in Python (2.0). Using the new API, non-BMP characters will be supported for free, everywhere (especially in *Python*, "\U0010FFFF"[0] and len("\U0010FFFF") doesn't give surprising results anymore). With the addition of emoticon in a non-BMP range in Unicode 6, non-BMP characters will become more and more common. Who doesn't like emoticon? :-) o;-) >< (no, I will no add non-BMP characters in this email, I don't want to crash your SMTP server and mail client) > IMHO, Python should be optimized for UCS2 usage With the PEP 393, it's better: Python is optimize for any usage! (but I expect it to be faster in the Latin1 range, U+0000-U+00FF) > I do see the advantage for large strings, though. A friend reads last Martin's benchmark differently: Python 3.2 uses 3x more memory than Python 2! Can I say that the PEP 393 fixed an huge regression of Python 3? > Given that I've been working on and maintaining the Python Unicode > implementation actively or by providing assistance for almost > 12 years now, I've also thought about whether it's still worth > the effort. Thanks for your huge work on Unicode, Marc-Andre! > My interests have shifted somewhat into other directions and > I feel that helping Python reach world domination in other ways > makes me happier than fighting over Unicode standards, implementations, > special cases that aren't special enough, and all those other > nitty-gritty details that cause long discussions :-) Someone said that we still need to define what a character is! By the way, what is a code point? > So I feel that the PEP 393 change is a good time to draw a line > and leave Unicode maintenance to Ezio, Victor, Martin, and > all the others that have helped over the years. I know it's > in good hands. I don't understand why you would like to stop contribution to Unicode, but well, as you want. We will try to continue your work. Victor From victor.stinner at haypocalc.com Thu Sep 29 03:45:59 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 29 Sep 2011 03:45:59 +0200 Subject: [Python-Dev] [Python-checkins] cpython: Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array In-Reply-To: References: Message-ID: <201109290345.59665.victor.stinner@haypocalc.com> Le jeudi 29 septembre 2011 02:07:02, Benjamin Peterson a ?crit : > 2011/9/28 victor.stinner : > > http://hg.python.org/cpython/rev/36fc514de7f0 > > changeset: 72512:36fc514de7f0 > > user: Victor Stinner > > date: Thu Sep 29 01:12:24 2011 +0200 > > summary: > > Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an > > array > > > > Move other various macros to pymcacro.h > > > > Thanks Rusty Russell for having written these amazing C macros! > > > > files: > > Include/Python.h | 19 +-------- > > Include/pymacro.h | 57 +++++++++++++++++++++++++++ > > Do we really need a new file? Why not pyport.h where other compiler stuff > goes? I'm not sure that pyport.h is the right place to add Py_MIN, Py_MAX, Py_ARRAY_LENGTH. pyport.h looks to be related to all things specific to the platform like INT_MAX, Py_VA_COPY, ... pymacro.h contains platform independant macros. I would like to suggest the opposite: move platform independdant macros from pyport.h to pymacro.h :-) Suggestions: - Py_ARITHMETIC_RIGHT_SHIFT - Py_FORCE_EXPANSION - Py_SAFE_DOWNCAST Victor From fperez.net at gmail.com Thu Sep 29 06:42:10 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 29 Sep 2011 04:42:10 +0000 (UTC) Subject: [Python-Dev] range objects in 3.x References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info> <4E7D3CB8.5050904@pearwood.info> <4E81261C.6040200@pearwood.info> <4E83A165.6080406@canterbury.ac.nz> Message-ID: On Thu, 29 Sep 2011 11:36:21 +1300, Greg Ewing wrote: >> I do hope, though, that the chosen name is *not*: >> >> - 'interval' >> >> - 'interpolate' or similar > > Would 'subdivide' be acceptable? I'm not great at finding names, and I don't totally love it, but I certainly don't see any problems with it. It is, after all, a subdivision of an interval :) I think 'grid' has been mentioned, and I think it's reasonable, even though most people probably associate the word with a two-dimensional object. But grids can have any desired dimensionality. Now, in fact, numpy has a slightly demented (but extremely useful) ogrid object: In [7]: ogrid[0:10:3] Out[7]: array([0, 3, 6, 9]) In [8]: ogrid[0:10:3j] Out[8]: array([ 0., 5., 10.]) Yup, that's a complex slice :) So if python named the builtin 'grid', I think it would go well with existing numpy habits. Cheers, f From ezio.melotti at gmail.com Thu Sep 29 09:54:37 2011 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Thu, 29 Sep 2011 10:54:37 +0300 Subject: [Python-Dev] Hg tips (was Re: [Python-checkins] cpython (merge default -> default): Merge heads.) Message-ID: Tip 1 -- merging heads: A while ago ?ric suggested a nice tip to make merges easier and since I haven't seen many people using it and now I got a chance to use it again, I think it might be worth showing it once more: # so assume you just committed some changes: $ hg ci Doc/whatsnew/3.3.rst -m 'Update and reorganize the whatsnew entry for PEP 393.' # you push them, but someone else pushed something in the meanwhile, so the push fails $ hg push pushing to ssh://hg at hg.python.org/cpython searching for changes abort: push creates new remote heads on branch 'default'! (you should pull and merge or use push -f to force) # so you pull the other changes $ hg pull -u pulling from ssh://hg at hg.python.org/cpython searching for changes adding changesets adding manifests adding file changes added 4 changesets with 5 changes to 5 files (+1 heads) not updating, since new heads added (run 'hg heads' to see heads, 'hg merge' to merge) # and use "hg heads ." to see the two heads (yours and the one you pulled) in the current branch $ hg heads . changeset: 72521:e6a2b54c1d16 tag: tip user: Victor Stinner date: Thu Sep 29 04:02:13 2011 +0200 summary: Fix hex_digit_to_int() prototype: expect Py_UCS4, not Py_UNICODE changeset: 72517:ba6ee5cc9ed6 user: Ezio Melotti date: Thu Sep 29 08:34:36 2011 +0300 summary: Update and reorganize the whatsnew entry for PEP 393. # here comes the tip: before merging you switch to the other head (i.e. the one pushed by Victor), # if you don't switch, you'll be merging Victor changeset and in case of conflicts you will have to review # and modify his code (e.g. put a Misc/NEWS entry in the right section or something more complicated) $ hg up e6a2b54c1d16 6 files updated, 0 files merged, 0 files removed, 0 files unresolved # after the switch you will merge the changeset you just committed, so in case of conflicts # reviewing and merging is much easier because you know the changes already $ hg merge 1 files updated, 0 files merged, 0 files removed, 0 files unresolved (branch merge, don't forget to commit) # here everything went fine and there were no conflicts, and in the diff I can see my last changeset $ hg di diff --git a/Doc/whatsnew/3.3.rst b/Doc/whatsnew/3.3.rst [...] # everything looks fine, so I can commit the merge and push $ hg ci -m 'Merge heads.' $ hg push pushing to ssh://hg at hg.python.org/cpython searching for changes remote: adding changesets remote: adding manifests remote: adding file changes remote: added 2 changesets with 1 changes to 1 files remote: buildbot: 2 changes sent successfully remote: notified python-checkins at python.org of incoming changeset ba6ee5cc9ed6 remote: notified python-checkins at python.org of incoming changeset e7672fe3cd35 This tip is not only useful while merging, but it's also useful for python-checkins reviews, because the "merge" mail has the same diff of the previous mail rather than having 15 unrelated changesets from the last week because the committer didn't pull in a while. Tip 2 -- extended diffs: If you haven't already, enable git diffs, adding to your ~/.hgrc the following two lines: > [diff] > git = True > (this is already in the devguide, even if 'git = on' is used there. The mercurial website uses git = True too.) More info: http://hgtip.com/tips/beginner/2009-10-22-always-use-git-diffs/ Tip 3 -- extensions: I personally like the 'color' extension, it makes the output of commands like 'hg diff' and 'hg stat' more readable (e.g. it shows removed lines in red and added ones in green). If you want to give it a try, add to your ~/.hgrc the following two lines: > [extensions] > color = > If you find operations like pulling, updating or cloning too slow, you might also want to look at the 'progress' extension, which displays a progress bar during these operations: > [extensions] > progress = > Tip 4 -- porting from 2.7 to 3.2: The devguide suggests: > > hg export a7df1a869e4a | hg import --no-commit - > but it's not always necessary to copy the changeset number manually. If you are porting your last commit you can just use 'hg export 2.7' (or any other branch name): * using the one-dir-per-branch setup: wolf at hp:~/dev/py/2.7$ hg ci -m 'Fix some bug.' wolf at hp:~/dev/py/2.7$ cd ../3.2 wolf at hp:~/dev/py/3.2$ hg pull -u ../2.7 wolf at hp:~/dev/py/3.2$ hg export 2.7 | hg import --no-commit - * using the single-dir setup: wolf at hp:~/dev/python$ hg branch 2.7 wolf at hp:~/dev/python$ hg ci -m 'Fix some bug.' wolf at hp:~/dev/python$ hg up 3.2 # here you might enjoy the progress extension wolf at hp:~/dev/python$ hg export 2.7 | hg import --no-commit - And then you can check that everything is fine, and commit on 3.2 too. Of course it works the other way around (from 3.2 to 2.7) too. I hope you'll find these tips useful. Best Regards, Ezio Melotti On Thu, Sep 29, 2011 at 8:36 AM, ezio.melotti wrote: > http://hg.python.org/cpython/rev/e7672fe3cd35 > changeset: 72522:e7672fe3cd35 > parent: 72520:e6a2b54c1d16 > parent: 72521:ba6ee5cc9ed6 > user: Ezio Melotti > date: Thu Sep 29 08:36:23 2011 +0300 > summary: > Merge heads. > > files: > Doc/whatsnew/3.3.rst | 63 +++++++++++++++++++++---------- > 1 files changed, 42 insertions(+), 21 deletions(-) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at haypocalc.com Thu Sep 29 12:07:14 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 29 Sep 2011 12:07:14 +0200 Subject: [Python-Dev] Hg tips In-Reply-To: References: Message-ID: <4E844352.8040606@haypocalc.com> Le 29/09/2011 09:54, Ezio Melotti a ?crit : > Tip 1 -- merging heads: > > A while ago ?ric suggested a nice tip to make merges easier and since I > haven't seen many people using it and now I got a chance to use it again, I > think it might be worth showing it once more: > > # so assume you just committed some changes: > $ hg ci Doc/whatsnew/3.3.rst -m 'Update and reorganize the whatsnew entry > for PEP 393.' > # you push them, but someone else pushed something in the meanwhile, so the > push fails > $ hg push > pushing to ssh://hg at hg.python.org/cpython > searching for changes > abort: push creates new remote heads on branch 'default'! > (you should pull and merge or use push -f to force) > # so you pull the other changes > $ hg pull -u > pulling from ssh://hg at hg.python.org/cpython > searching for changes > adding changesets > adding manifests > adding file changes > added 4 changesets with 5 changes to 5 files (+1 heads) > not updating, since new heads added > (run 'hg heads' to see heads, 'hg merge' to merge) > # and use "hg heads ." to see the two heads (yours and the one you pulled) > in the current branch > $ hg heads . > changeset: 72521:e6a2b54c1d16 > tag: tip > user: Victor Stinner > date: Thu Sep 29 04:02:13 2011 +0200 > summary: Fix hex_digit_to_int() prototype: expect Py_UCS4, not > Py_UNICODE > > changeset: 72517:ba6ee5cc9ed6 > user: Ezio Melotti > date: Thu Sep 29 08:34:36 2011 +0300 > summary: Update and reorganize the whatsnew entry for PEP 393. > # here comes the tip: before merging you switch to the other head (i.e. the > one pushed by Victor), > # if you don't switch, you'll be merging Victor changeset and in case of > conflicts you will have to review > # and modify his code (e.g. put a Misc/NEWS entry in the right section or > something more complicated) > $ hg up e6a2b54c1d16 > 6 files updated, 0 files merged, 0 files removed, 0 files unresolved > # after the switch you will merge the changeset you just committed, so in > case of conflicts > # reviewing and merging is much easier because you know the changes already > $ hg merge > 1 files updated, 0 files merged, 0 files removed, 0 files unresolved > (branch merge, don't forget to commit) > # here everything went fine and there were no conflicts, and in the diff I > can see my last changeset > $ hg di > diff --git a/Doc/whatsnew/3.3.rst b/Doc/whatsnew/3.3.rst > [...] > # everything looks fine, so I can commit the merge and push > $ hg ci -m 'Merge heads.' > $ hg push > pushing to ssh://hg at hg.python.org/cpython > searching for changes > remote: adding > changesets > > remote: adding manifests > remote: adding file changes > remote: added 2 changesets with 1 changes to 1 files > remote: buildbot: 2 changes sent successfully > remote: notified python-checkins at python.org of incoming changeset > ba6ee5cc9ed6 > remote: notified python-checkins at python.org of incoming changeset > e7672fe3cd35 > > This tip is not only useful while merging, but it's also useful for > python-checkins reviews, because the "merge" mail has the same diff of the > previous mail rather than having 15 unrelated changesets from the last week > because the committer didn't pull in a while. I prefer "hg pull --rebase && hg push": it's just one command (ok, two), there is nothing to do (it's fast)... if the new changes are not in conflict with my local changes, and it keeps a nice linear history. hg rebase is more dangerous: you may lose work if you misuse it. hg rebase is maybe more complex when you have a conflict (I don't really know, I never use hg merge). hg rebase doesn't work at all if you have local changes in different branches. If hg push fails, I prefer to *remove* my changes using hg strip (!), update and redo the commits on the new tip. I should sometimes fix hg rebase instead :-) > Tip 2 -- extended diffs: > > If you haven't already, enable git diffs, adding to your ~/.hgrc the > following two lines: > >> [diff] >> git = True >> > (this is already in the devguide, even if 'git = on' is used there. The > mercurial website uses git = True too.) > More info: http://hgtip.com/tips/beginner/2009-10-22-always-use-git-diffs/ For diff, "showfunc = on" is also a cool feature. See my full ~/.hgrc: https://bitbucket.org/haypo/misc/src/tip/conf/hgrc * I disabled the merge GUI: I lose a lot of work because I'm unable to use a GUI to do merge, I don't understand what are the 3 versions of the same file (which one is the merged version!?) * pager extension is just a must have * hgeditor is also a must have to write the changelog: in vim, it opens a second buffer with the diff I also use "hg record" (like "git add -i") to do partial commit: after hacking during 3 hours, I do atomic commits. Then I use hg histedit (like "git rebase -i") to merge and reorganize local commits. It's useful to hide "oops, typo in my last commit". > If you find operations like pulling, updating or cloning too slow, you might > also want to look at the 'progress' extension, which displays a progress bar > during these operations: > >> [extensions] >> progress = Yeah, I like it too :-) Victor From catch-all at masklinn.net Thu Sep 29 12:34:34 2011 From: catch-all at masklinn.net (Xavier Morel) Date: Thu, 29 Sep 2011 12:34:34 +0200 Subject: [Python-Dev] Hg tips In-Reply-To: <4E844352.8040606@haypocalc.com> References: <4E844352.8040606@haypocalc.com> Message-ID: <165FF37D-8EE8-49CD-817C-600022942086@masklinn.net> On 2011-09-29, at 12:07 , Victor Stinner wrote: > > * I disabled the merge GUI: I lose a lot of work because I'm unable to use a GUI to do merge, I don't understand what are the 3 versions of the same file (which one is the merged version!?) Generally none. By default, mercurial (and most similar tools) sets up LOCAL, BASE and OTHER. BASE is the last "common" state, LOCAL is the locally modified file and OTHER is the remotely modified file (which you're trying to merge). The behavior after that depends, mercurial has an OUTPUT pointer (for the result file), many tools just write the non-postfixed file with the merge result. And depending on your precise tool it can attempt to perform its own merge resolution before showing you the files, or just show you the three files provided and you set up your changes into BASE from LOCAL and OTHER. If you reach that state, it's becaused mercurial could not automatically process the merge so there's no merged version to display. Maybe thinking of it as a file with conflict markers split into three (one without the conflicting sections, one with only the first part of the sections and one with only the second part) would make it clearer? From victor.stinner at haypocalc.com Thu Sep 29 12:50:19 2011 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 29 Sep 2011 12:50:19 +0200 Subject: [Python-Dev] Hg tips In-Reply-To: <165FF37D-8EE8-49CD-817C-600022942086@masklinn.net> References: <4E844352.8040606@haypocalc.com> <165FF37D-8EE8-49CD-817C-600022942086@masklinn.net> Message-ID: <4E844D6B.7090304@haypocalc.com> Le 29/09/2011 12:34, Xavier Morel a ?crit : > Generally none. By default, mercurial (and most similar tools) sets up LOCAL, BASE and OTHER. BASE is the... Sorry, but I'm unable to remember the meaning of LOCAL, BASE and OTHER. In meld, I have to scroll to the end of the filename so see the filename suffix. Anyway, my real problem was different: hg opened meld with the 3 versions, but the BASE was already merged. I mean that hg chose for me what is the right version, without letting me choose myself what is the good version, because if I just close meld, I lose my local changes. Because a merge is a new commit, I suppose that I can do something to get my local changes back. But, well, I just prefer the "legacy" (?) merge flavor: <<<< local ... === ... >>> other It's easier for my brain because I just have 2 versions of the same code, not 3! But it looks like some people are more confortable with 3 versions in a GUI, because it is the default Mercurial behaviour (to open a GUI to solve conflicts). Victor From stefan at bytereef.org Thu Sep 29 12:58:10 2011 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 29 Sep 2011 12:58:10 +0200 Subject: [Python-Dev] Hg tips In-Reply-To: <4E844D6B.7090304@haypocalc.com> References: <4E844352.8040606@haypocalc.com> <165FF37D-8EE8-49CD-817C-600022942086@masklinn.net> <4E844D6B.7090304@haypocalc.com> Message-ID: <20110929105810.GA20947@sleipnir.bytereef.org> Victor Stinner wrote: > Because a merge is a new commit, I suppose that I can do something to > get my local changes back. But, well, I just prefer the "legacy" (?) > merge flavor: > > <<<< local > ... > === > ... > >>> other > > It's easier for my brain because I just have 2 versions of the same > code, not 3! I also prefer /usr/bin/merge and I've never quite figured out the GUI. Not that I spent a lot of time on it, since the "legacy" merge works well (and is self-explanatory). Stefan Krah From catch-all at masklinn.net Thu Sep 29 13:20:39 2011 From: catch-all at masklinn.net (Xavier Morel) Date: Thu, 29 Sep 2011 13:20:39 +0200 Subject: [Python-Dev] Hg tips In-Reply-To: <4E844D6B.7090304@haypocalc.com> References: <4E844352.8040606@haypocalc.com> <165FF37D-8EE8-49CD-817C-600022942086@masklinn.net> <4E844D6B.7090304@haypocalc.com> Message-ID: <305907B5-C766-4A03-9851-3ACF97107B52@masklinn.net> On 2011-09-29, at 12:50 , Victor Stinner wrote: > Le 29/09/2011 12:34, Xavier Morel a ?crit : >> Generally none. By default, mercurial (and most similar tools) sets up LOCAL, BASE and OTHER. BASE is the... > > Sorry, but I'm unable to remember the meaning of LOCAL, BASE and OTHER. In meld, I have to scroll to the end of the filename so see the filename suffix. Anyway, my real problem was different: hg opened meld with the 3 versions, but the BASE was already merged. I mean that hg chose for me what is the right version, without letting me choose myself what is the good version, because if I just close meld, I lose my local changes. I'd bet it's Meld doing that, though I have not checked (Araxis Merge does something similar, it has its own merge-algorithm which it tries to apply in case of 3-ways merge, trying to merge LOCAL and OTHER into base on its own). Look into Meld's configuration, it might be possible to disable that. (an other possibility would be that the wrong file pointers are send to Meld, so it gets e.g. twice the same file) > Because a merge is a new commit, I suppose that I can do something to get my local changes back. But, well, I just prefer the "legacy" (?) merge flavor: > > <<<< local > ... > === > ... > >>> other > > It's easier for my brain because I just have 2 versions of the same code, not 3! > > But it looks like some people are more confortable with 3 versions in a GUI, because it is the default Mercurial behaviour (to open a GUI to solve conflicts). > I'd be part of that camp, yes (though I'll use either depending on the exact situation, there are cases where seeing what both branches diverged from is very useful). I find having all three version makes it easier to correctly mix the two diverging versions, with /usr/bin/merge-style conflict markers it's harder to understand what both branches diverged from and hence how their changes fit into one another. From jimjjewett at gmail.com Thu Sep 29 15:22:00 2011 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 29 Sep 2011 09:22:00 -0400 Subject: [Python-Dev] [Python-checkins] cpython: Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array In-Reply-To: References: Message-ID: On Wed, Sep 28, 2011 at 8:07 PM, Benjamin Peterson wrote: > 2011/9/28 victor.stinner : >> http://hg.python.org/cpython/rev/36fc514de7f0 >> changeset: ? 72512:36fc514de7f0 ... >> Thanks Rusty Russell for having written these amazing C macros! > Do we really need a new file? Why not pyport.h where other compiler stuff goes? I would expect pyport to contain only system-specific macros. These seem more universal. -jJ From barry at python.org Thu Sep 29 17:11:50 2011 From: barry at python.org (Barry Warsaw) Date: Thu, 29 Sep 2011 11:11:50 -0400 Subject: [Python-Dev] Hg tips In-Reply-To: <4E844352.8040606@haypocalc.com> References: <4E844352.8040606@haypocalc.com> Message-ID: <20110929111150.352b7be5@resist.wooz.org> On Sep 29, 2011, at 12:07 PM, Victor Stinner wrote: > I disabled the merge GUI: I lose a lot of work because I'm unable to use a > GUI to do merge, I don't understand what are the 3 versions of the same file > (which one is the merged version!?) Emacs users should look at smerge-mode. It has some nice keybindings and colorizing that usually makes resolving conflicts fairly straightforward. It also will automatically `$vcs resolve` the file when you've handled all the conflicts. Caveat: I use it primarily for bzr, but I think it works with most vcs's. -Barry From g.brandl at gmx.net Thu Sep 29 18:20:43 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 29 Sep 2011 18:20:43 +0200 Subject: [Python-Dev] Hg tips In-Reply-To: <4E844D6B.7090304@haypocalc.com> References: <4E844352.8040606@haypocalc.com> <165FF37D-8EE8-49CD-817C-600022942086@masklinn.net> <4E844D6B.7090304@haypocalc.com> Message-ID: Am 29.09.2011 12:50, schrieb Victor Stinner: > Le 29/09/2011 12:34, Xavier Morel a ?crit : >> Generally none. By default, mercurial (and most similar tools) sets up LOCAL, BASE and OTHER. BASE is the... > > Sorry, but I'm unable to remember the meaning of LOCAL, BASE and OTHER. > In meld, I have to scroll to the end of the filename so see the filename > suffix. Anyway, my real problem was different: hg opened meld with the 3 > versions, but the BASE was already merged. I mean that hg chose for me > what is the right version, without letting me choose myself what is the > good version, because if I just close meld, I lose my local changes. > > Because a merge is a new commit, I suppose that I can do something to > get my local changes back. But, well, I just prefer the "legacy" (?) > merge flavor: > > <<<< local > ... > === > ... > >>> other > > It's easier for my brain because I just have 2 versions of the same > code, not 3! I prefer this as well, since I also find most merge tools unbearable. (At some point I should probably learn emacs' ediff.) But in some cases, you really lose information when you don't see the base version, since in the case of contradicting changes it is very useful to see where both came from. Georg From g.brandl at gmx.net Thu Sep 29 18:21:53 2011 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 29 Sep 2011 18:21:53 +0200 Subject: [Python-Dev] Hg tips In-Reply-To: <20110929111150.352b7be5@resist.wooz.org> References: <4E844352.8040606@haypocalc.com> <20110929111150.352b7be5@resist.wooz.org> Message-ID: Am 29.09.2011 17:11, schrieb Barry Warsaw: > On Sep 29, 2011, at 12:07 PM, Victor Stinner wrote: > >> I disabled the merge GUI: I lose a lot of work because I'm unable to use a >> GUI to do merge, I don't understand what are the 3 versions of the same file >> (which one is the merged version!?) > > Emacs users should look at smerge-mode. It has some nice keybindings and > colorizing that usually makes resolving conflicts fairly straightforward. It > also will automatically `$vcs resolve` the file when you've handled all the > conflicts. > > Caveat: I use it primarily for bzr, but I think it works with most vcs's. Yes, this is what I do as well for hg. (I had to write the "hg resolve -m" support myself, but that was a year or two ago. I assume it's out-of-the-box now.) Georg From dickinsm at gmail.com Thu Sep 29 19:04:38 2011 From: dickinsm at gmail.com (Mark Dickinson) Date: Thu, 29 Sep 2011 18:04:38 +0100 Subject: [Python-Dev] [Python-checkins] cpython: Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array In-Reply-To: <201109290345.59665.victor.stinner@haypocalc.com> References: <201109290345.59665.victor.stinner@haypocalc.com> Message-ID: On Thu, Sep 29, 2011 at 2:45 AM, Victor Stinner wrote: > I would like to suggest the opposite: move platform independdant macros from > pyport.h to pymacro.h :-) Suggestions: > ?- Py_ARITHMETIC_RIGHT_SHIFT > ?- Py_FORCE_EXPANSION > ?- Py_SAFE_DOWNCAST Not sure about the other two, but Py_ARITHMETIC_RIGHT_SHIFT is definitely platform dependent, which is why it's in pyport.h in the first place. Mark From status at bugs.python.org Fri Sep 30 18:07:28 2011 From: status at bugs.python.org (Python tracker) Date: Fri, 30 Sep 2011 18:07:28 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20110930160728.D1FA41CA8F@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2011-09-23 - 2011-09-30) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 3046 (+16) closed 21813 (+25) total 24859 (+41) Open issues with patches: 1301 Issues opened (31) ================== #13038: distutils windows installer STATUS_INVALID_CRUNTIME_PARAMETER http://bugs.python.org/issue13038 opened by mitchfrazier #13039: IDLE editor: shell-like behaviour on line starting with ">>>" http://bugs.python.org/issue13039 opened by etuardu #13040: call to tkinter.messagebox.showinfo hangs the script on timer http://bugs.python.org/issue13040 opened by Richard86 #13041: argparse: terminal width is not detected properly http://bugs.python.org/issue13041 opened by zbysz #13044: pdb throws AttributeError at end of debugging session http://bugs.python.org/issue13044 opened by akl #13045: socket.getsockopt may require custom buffer contents http://bugs.python.org/issue13045 opened by Artyom.Gavrichenkov #13047: imp.find_module("") and imp.find_module(".") http://bugs.python.org/issue13047 opened by Arfrever #13048: Handling of paths in first argument of imp.find_module() http://bugs.python.org/issue13048 opened by Arfrever #13049: distutils2 should not allow a distribution to install under an http://bugs.python.org/issue13049 opened by carljm #13050: RLock support the context manager protocol but this is not doc http://bugs.python.org/issue13050 opened by r.david.murray #13051: Infinite recursion in curses.textpad.Textbox http://bugs.python.org/issue13051 opened by tycho #13052: IDLE: replace ending with '\' causes crash http://bugs.python.org/issue13052 opened by terry.reedy #13053: Add Capsule migration documentation to "cporting" http://bugs.python.org/issue13053 opened by larry #13054: sys.maxunicode value after PEP-393 http://bugs.python.org/issue13054 opened by ezio.melotti #13055: Distutils tries to handle null versions but fails http://bugs.python.org/issue13055 opened by bgamari #13056: test_multibytecodec.py:TestStreamWriter is skipped after PEP39 http://bugs.python.org/issue13056 opened by ezio.melotti #13057: Thread not working for python 2.7.1 built with HP Compiler on http://bugs.python.org/issue13057 opened by wah meng #13059: Sporadic test_multiprocessing failure: IOError("bad message le http://bugs.python.org/issue13059 opened by haypo #13060: allow other rounding modes in round() http://bugs.python.org/issue13060 opened by ArneBab #13061: Decimal module yields incorrect results when Python compiled w http://bugs.python.org/issue13061 opened by josharian #13062: Introspection generator and function closure state http://bugs.python.org/issue13062 opened by ncoghlan #13063: test_concurrent_futures failures on Windows: IOError('[Errno 2 http://bugs.python.org/issue13063 opened by haypo #13064: Port codecs and error handlers to the new Unicode API http://bugs.python.org/issue13064 opened by haypo #13070: segmentation fault in pure-python multi-threaded server http://bugs.python.org/issue13070 opened by vsemionov #13071: IDLE refuses to open on windows 7 http://bugs.python.org/issue13071 opened by jfalskfjdsl;akfdjsa;l.laksfj;aslkfdj;sal #13072: Getting a buffer from a Unicode array uses invalid format http://bugs.python.org/issue13072 opened by haypo #13073: message_body argument of HTTPConnection.endheaders is undocume http://bugs.python.org/issue13073 opened by petri.lehtinen #13074: Improve documentation of locale encoding functions http://bugs.python.org/issue13074 opened by gjb1002 #13075: PEP-0001 contains dead links http://bugs.python.org/issue13075 opened by ezander #13076: Bad links to 'time' in datetime documentation http://bugs.python.org/issue13076 opened by gjb1002 #13077: Unclear behavior of daemon threads on main thread exit http://bugs.python.org/issue13077 opened by etuardu Most recent 15 issues with no replies (15) ========================================== #13076: Bad links to 'time' in datetime documentation http://bugs.python.org/issue13076 #13075: PEP-0001 contains dead links http://bugs.python.org/issue13075 #13074: Improve documentation of locale encoding functions http://bugs.python.org/issue13074 #13073: message_body argument of HTTPConnection.endheaders is undocume http://bugs.python.org/issue13073 #13072: Getting a buffer from a Unicode array uses invalid format http://bugs.python.org/issue13072 #13070: segmentation fault in pure-python multi-threaded server http://bugs.python.org/issue13070 #13064: Port codecs and error handlers to the new Unicode API http://bugs.python.org/issue13064 #13056: test_multibytecodec.py:TestStreamWriter is skipped after PEP39 http://bugs.python.org/issue13056 #13055: Distutils tries to handle null versions but fails http://bugs.python.org/issue13055 #13051: Infinite recursion in curses.textpad.Textbox http://bugs.python.org/issue13051 #13050: RLock support the context manager protocol but this is not doc http://bugs.python.org/issue13050 #13045: socket.getsockopt may require custom buffer contents http://bugs.python.org/issue13045 #13038: distutils windows installer STATUS_INVALID_CRUNTIME_PARAMETER http://bugs.python.org/issue13038 #13032: h2py.py can fail with UnicodeDecodeError http://bugs.python.org/issue13032 #13024: cgitb uses stdout encoding http://bugs.python.org/issue13024 Most recent 15 issues waiting for review (15) ============================================= #13077: Unclear behavior of daemon threads on main thread exit http://bugs.python.org/issue13077 #13061: Decimal module yields incorrect results when Python compiled w http://bugs.python.org/issue13061 #13057: Thread not working for python 2.7.1 built with HP Compiler on http://bugs.python.org/issue13057 #13055: Distutils tries to handle null versions but fails http://bugs.python.org/issue13055 #13054: sys.maxunicode value after PEP-393 http://bugs.python.org/issue13054 #13051: Infinite recursion in curses.textpad.Textbox http://bugs.python.org/issue13051 #13045: socket.getsockopt may require custom buffer contents http://bugs.python.org/issue13045 #13041: argparse: terminal width is not detected properly http://bugs.python.org/issue13041 #13032: h2py.py can fail with UnicodeDecodeError http://bugs.python.org/issue13032 #13031: small speed-up for tarfile.py when unzipping tarballs http://bugs.python.org/issue13031 #13025: mimetypes should read the rule file using UTF-8, not the local http://bugs.python.org/issue13025 #13024: cgitb uses stdout encoding http://bugs.python.org/issue13024 #13018: dictobject.c: refleak http://bugs.python.org/issue13018 #13017: pyexpat.c: refleak http://bugs.python.org/issue13017 #13016: selectmodule.c: refleak http://bugs.python.org/issue13016 Top 10 most discussed issues (10) ================================= #13057: Thread not working for python 2.7.1 built with HP Compiler on http://bugs.python.org/issue13057 18 msgs #13060: allow other rounding modes in round() http://bugs.python.org/issue13060 10 msgs #1621: Do not assume signed integer overflow behavior http://bugs.python.org/issue1621 7 msgs #13054: sys.maxunicode value after PEP-393 http://bugs.python.org/issue13054 7 msgs #12242: distutils2 environment marker for current compiler http://bugs.python.org/issue12242 5 msgs #12806: argparse: Hybrid help text formatter http://bugs.python.org/issue12806 5 msgs #12966: cookielib.LWPCookieJar breaks on cookie values with a newline http://bugs.python.org/issue12966 5 msgs #11751: Increase distutils.filelist / packaging.manifest test coverage http://bugs.python.org/issue11751 4 msgs #12737: str.title() is overzealous by upcasing combining marks inappro http://bugs.python.org/issue12737 4 msgs #13033: Add shutil.chowntree http://bugs.python.org/issue13033 4 msgs Issues closed (23) ================== #1092365: Distutils needs a way *not* to install files http://bugs.python.org/issue1092365 closed by eric.araujo #3130: In some UCS4 builds, sizeof(Py_UNICODE) could end up being mor http://bugs.python.org/issue3130 closed by haypo #8654: Improve ABI compatibility between UCS2 and UCS4 builds http://bugs.python.org/issue8654 closed by stutzbach #8927: Handle version incompatibilities in dependencies http://bugs.python.org/issue8927 closed by eric.araujo #9306: distutils: raise informative error message when cmd_class is N http://bugs.python.org/issue9306 closed by eric.araujo #9395: clean does not remove all temp files http://bugs.python.org/issue9395 closed by eric.araujo #12746: normalization is affected by unicode width http://bugs.python.org/issue12746 closed by benjamin.peterson #12819: PEP 393 - Flexible Unicode String Representation http://bugs.python.org/issue12819 closed by haypo #12981: rewrite multiprocessing (senfd|recvfd) in Python http://bugs.python.org/issue12981 closed by neologix #13008: syntax error when pasting valid snippet into console without e http://bugs.python.org/issue13008 closed by eric.araujo #13012: Allow keyword argument in str.splitlines() http://bugs.python.org/issue13012 closed by ezio.melotti #13013: _ctypes.c: refleak http://bugs.python.org/issue13013 closed by meador.inge #13035: "maintainer" value clear the "author" value when registering http://bugs.python.org/issue13035 closed by eric.araujo #13037: [Regression] socket.error does not inherit from IOError as doc http://bugs.python.org/issue13037 closed by Christopher.Egner #13042: argparse: terminal width is not detected properly http://bugs.python.org/issue13042 closed by ezio.melotti #13043: Unexpected behavior of imp.find_module(".") with a package pre http://bugs.python.org/issue13043 closed by Arfrever #13046: imp.find_module() should not find unimportable modules http://bugs.python.org/issue13046 closed by brett.cannon #13058: Fix file descriptor leak on error http://bugs.python.org/issue13058 closed by neologix #13065: test http://bugs.python.org/issue13065 closed by vsemionov #13066: test http://bugs.python.org/issue13066 closed by vsemionov #13067: test http://bugs.python.org/issue13067 closed by vsemionov #13068: test http://bugs.python.org/issue13068 closed by vsemionov #13069: test http://bugs.python.org/issue13069 closed by ezio.melotti From chris at simplistix.co.uk Fri Sep 30 19:46:08 2011 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 30 Sep 2011 18:46:08 +0100 Subject: [Python-Dev] Inconsistent script/console behaviour In-Reply-To: References: Message-ID: <4E860060.1040505@simplistix.co.uk> On 24/09/2011 00:32, Guido van Rossum wrote: > The interactive console is optimized for people entering code by > typing, not by copying and pasting large gobs of text. > > If you think you can have it both, show us the code. Anatoly wants ipython's new qtconsole. This "does the right thing" because it's a GUI app and so can manipulate the content on paste... Not sure if you can do that in a console app... cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk