From tjreedy at udel.edu  Thu Sep  1 00:02:53 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 31 Aug 2011 18:02:53 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
Message-ID: <j3mb3p$hcb$1@dough.gmane.org>

On 8/31/2011 1:10 PM, Guido van Rossum wrote:

> This is why I find the issue of Python, the language (and stdlib), as
> a whole "conforming to the Unicode standard" such a troublesome
> concept -- I think it is something that an application may claim, but
> the language should make much more modest claims, such as "the regular
> expression syntax supports features X, Y and Z from the Unicode
> recommendation XXX, or "the UTF-8 codec will never emit a sequence of
> bytes that is invalid according Unicode specification YYY". (As long
> as the Unicode references are also versioned or dated.)

This will be a great improvement. It was both embarrassing and 
frustrating to have to respond to Tom C.'s (and other's) issue with "Our 
unicode type is too vaguely documented to tell whether you are reporting 
a bug or making a feature request.

> But if you can observe (valid) surrogate pairs it is still UTF-16.
...
> Ok, I dig this, to some extent. However saying it is UCS-2 is equally
> bad.

As I said on the tracker, our narrow builds are in-between (while moving 
closer to UTF-16), and both terms are deceptive, at least to some.

> At the same time I think it would be useful if certain string
> operations like .lower() worked in such a way that *if* the input were
> valid UTF-16, *then* the output would also be, while *if* the input
> contained an invalid surrogate, the result would simply be something
> that is no worse (in particular, those are all mapped to themselves).
> We could even go further and have .lower() and friends look at
> graphemes (multi-code-point characters) if the Unicode std has a
> useful definition of e.g. lowercasing graphemes that differed from
> lowercasing code points.
>
> An analogy is actually found in .lower() on 8-bit strings in Python 2:
> it assumes the string contains ASCII, and non-ASCII characters are
> mapped to themselves. If your string contains Latin-1 or EBCDIC or
> UTF-8 it will not do the right thing. But that doesn't mean strings
> cannot contain those encodings, it just means that the .lower() method
> is not useful if they do. (Why ASCII? Because that is the system
> encoding in Python 2.)

Good analogy.

> Let's call those things graphemes (Tom C's term, I quite like leaving
> "character" ambiguous) -- they are sequences of multiple code points
> that represent a single "visual squiggle" (the kind of thing that
> you'd want to be swappable in vim with "xp" :-). I agree that APIs are
> needed to manipulate (match, generate, validate, mutilate, etc.)
> things at the grapheme level. I don't agree that this means a separate
> data type is required.

I presume by 'separate data type' you mean a base level builtin class 
like int or str and that you would allow for wrapper classes built on 
top of str, as such are not really 'separate'. For grapheme leval and 
higher, we should certainly start with wrappers and probably with 
alternate versions based on different strategies.

> There are ever-larger units of information
> encoded in text strings, with ever farther-reaching (and more vague)
> requirements on valid sequences. Do you want to have a data type that
> can represent (only valid) words in a language? Sentences? Novels?
...
> I think that at this point in time the best we can do is claim that
> Python (the language standard) uses either 16-bit code units or 21-bit
> code points in its string datatype, and that, thanks to PEP 393,
> CPython 3.3 and further will always use 21-bit code points (but Jython
> and IronPython may forever use their platform's native 16-bit code
> unit representing string type). And then we add APIs that can be used
> everywhere to look for code points (even if the string contains code
> points), graphemes, or larger constructs. I'd like those APIs to be
> designed using a garbage-in-garbage-out principle, where if the input
> conforms to some Unicode requirement, the output does too, but if the
> input doesn't, the output does what makes most sense. Validation is
> then limited to codecs, and optional calls.
>
> If you index or slice a string, or create a string from chr() of a
> surrogate or from some other value that the Unicode standard considers
> an illegal code point, you better know what you are doing. I want
> chr(i) to be valid for all values of i in range(2**21),

Actually, it is range(0X110000) == range(1114112) so that UTF-8 uses at 
most 4 bytes per codepoint. 21 bits is 20.1 bits rounded up.

> so it can be
> used to create a lone surrogate, or (on systems with 16-bit
> "characters") a surrogate pair. And also ord(chr(i)) == i for all i in
> range(2**21).

for i in range(0x110000):  # 1114112
     if ord(chr(i)) != i:
         print(i)
# prints nothing (on Windows)

 > I'm not sure about ord() on a 2-character string
> containing a surrogate pair on systems where strings contain 21-bit
> code points; I think it should be an error there, just as ord() on
> other strings of length != 1. But on systems with 16-bit "characters",
> ord() of strings of length 2 containing a valid surrogate pair should
> work.

And now does, thanks to whoever fixed this (withing the last year, I think).

-- 
Terry Jan Reedy


From ncoghlan at gmail.com  Thu Sep  1 00:44:59 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 1 Sep 2011 08:44:59 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j3mb3p$hcb$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<j3mb3p$hcb$1@dough.gmane.org>
Message-ID: <CADiSq7eSDfSX=hSS14NBmAV_zZ8MXRBdDkGmTGAoTR=PtJ-JBw@mail.gmail.com>

On Thu, Sep 1, 2011 at 8:02 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 8/31/2011 1:10 PM, Guido van Rossum wrote:
>> Ok, I dig this, to some extent. However saying it is UCS-2 is equally
>> bad.
>
> As I said on the tracker, our narrow builds are in-between (while moving
> closer to UTF-16), and both terms are deceptive, at least to some.

We should probably just explicitly document that the internal
representation in narrow builds is a UCS-2/UTF-16 hybrid - like
UTF-16, it can handle the full code point space, but, like UCS-2, it
allows code unit sequences (such as lone surrogates) that strict
UTF-16 would reject.

Perhaps we should also finally split strings out to a dedicated
section on the same tier as Sequence types in the library reference.
Yes, they're sequences, but they're also so much more than that (try
as you might, you're unlikely to be successful in ducktyping strings
the way you can sequences, mappings, files, numbers and other
interfaces. Needing a "real string" is even more common than needing a
"real dict", especially after the efforts to make most parts of the
interpreter that previously cared about the latter distinction accept
arbitrary mapping objects).

I've created http://bugs.python.org/issue12874, suggesting that the
"Sequence Types" and "memoryview type" sections could be usefully
rearranged as:

    Sequence Types - list, tuple, range
    Text Data - str
    Binary Data - bytes, bytearray, memoryview

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ncoghlan at gmail.com  Thu Sep  1 01:49:18 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 1 Sep 2011 09:49:18 +1000
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAP7+vJKN02MDGboQDHeeDL2Tusd+yOM5JBx+D1s3Nbkucu9K-w@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net>
	<CAP7v7k62ORPzPFcLOsgESw6Tjjc1A5yTOE+heZP71feC_OB3Hg@mail.gmail.com>
	<CAP7+vJKN02MDGboQDHeeDL2Tusd+yOM5JBx+D1s3Nbkucu9K-w@mail.gmail.com>
Message-ID: <CADiSq7dZYUmknR1r6sBLfSY34g7AVm2aNquLfVe0bY_Yq+RBbA@mail.gmail.com>

On Thu, Sep 1, 2011 at 3:28 AM, Guido van Rossum <guido at python.org> wrote:
> On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro
> Cesare, I'm really sorry that you became so disillusioned that you
> abandoned wordcode. I agree that we were too optimistic about Unladen
> Swallow. Also that the existence of PyPy and its PR machine (:-)
> should not stop us from improving CPython.

Yep, and I'll try to do a better job of discouraging creeping
complexity (without adequate payoffs) without the harmful side effect
of discouraging experimentation with CPython performance improvements
in general.

It's massive "rewrite the world" changes, that don't adequately
account for all the ways CPython gets used or the fact that core devs
need to be able to effectively *review* the changes, that are unlikely
to ever get anywhere. More localised changes, or those that are
relatively easy to explain have a much better chance.

So I'll switch my tone to just trying to make sure that portability
and maintainability concerns are given due weight :)

Cheers,
Nick.

P.S. I suspect a big part of my attitude stems from the fact that
we're still trying to untangle some of the consequences of committing
the PEP 3118 new buffer API implementation with inadequate review (it
turns out the implementation didn't reflect the PEP and the PEP had
deficiencies of its own), and I was one of the ones advocating in
favour of that patch. Once bitten, twice shy, etc.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From nyamatongwe at gmail.com  Thu Sep  1 02:58:57 2011
From: nyamatongwe at gmail.com (Neil Hodgson)
Date: Thu, 1 Sep 2011 10:58:57 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5E8811.90600@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>
	<4E5E82B0.4020302@g.nevcal.com>
	<CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>
	<4E5E8811.90600@g.nevcal.com>
Message-ID: <CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com>

Glenn Linderman:

> That said, regexp, or some sort of cursor on a string, might be a workable
> solution.? Will it have adequate performance?? Perhaps, at least for some
> applications.? Will it be as conceptually simple as indexing an array of
> graphemes?? No.? Will it ever reach the efficiency of indexing an array of
> graphemes? No.? Does that matter? Depends on the application.

   Using an iterator for cluster access is a common technique
currently. For example, with the Pango text layout and drawing
library, you may create a PangoLayoutIter over a text layout object
(which contains a UTF-8 string along with formatting information) and
iterate by clusters by calling pango_layout_iter_next_cluster. Direct
access to clusters by index is not as useful in this domain as access
by pixel positions - for example to examine the portion of a layout
visible in a window.

   http://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-get-iter
   In this API, 'index' is used to refer to a byte index into UTF-8,
not a character or cluster index.

   Rather than discuss functionality in the abstract, we need some use
cases involving different levels of character and cluster access to
see whether providing indexed access is worthwhile. I'll start with an
example: some text drawing engines draw decomposed characters ("o"
followed by " ?" -> "o?") differently compared to their composite
equivalents ("?") and this may be perceived as better or worse. I'd
like to offer an option to replace some decomposed characters with
their composite equivalent before drawing but since other characters
may look worse, I don't want to do a full normalization. The API style
that appears most useful for this example is an iterator over the
input string that yields composed and decomposed character strings
(that is, it will yield both "?" and "o?"), each character string is
then converted if in a substitution dictionary and written to an
output string. This is similar to an iterator over grapheme clusters
although, since it is only aimed at composing sequences, the iterator
could be simpler than a full grapheme cluster iterator.

   One of the benefits of iterator access to text is that many
different iterators can be built without burdening the implementation
object with extra memory costs as would be likely with techniques that
build indexes into the representation.

   Neil

From guido at python.org  Thu Sep  1 03:11:28 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 31 Aug 2011 18:11:28 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>
	<4E5E82B0.4020302@g.nevcal.com>
	<CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>
	<4E5E8811.90600@g.nevcal.com>
	<CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com>
Message-ID: <CAP7+vJLM+4zJRJkkZrQLC0KONFEjnHxQfCbqAo48OKcNLLPt9w@mail.gmail.com>

On Wed, Aug 31, 2011 at 5:58 PM, Neil Hodgson <nyamatongwe at gmail.com> wrote:
> [...] some text drawing engines draw decomposed characters ("o"
> followed by " ?" -> "o?") differently compared to their composite
> equivalents ("?") and this may be perceived as better or worse. I'd
> like to offer an option to replace some decomposed characters with
> their composite equivalent before drawing but since other characters
> may look worse, I don't want to do a full normalization.

Isn't this an issue properly solved by various normal forms?

-- 
--Guido van Rossum (python.org/~guido)

From hagen at zhuliguan.net  Thu Sep  1 03:27:28 2011
From: hagen at zhuliguan.net (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=)
Date: Wed, 31 Aug 2011 21:27:28 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJLM+4zJRJkkZrQLC0KONFEjnHxQfCbqAo48OKcNLLPt9w@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<4E576793.2010203@v.loewis.de>
	<4E5824E1.9010101@udel.edu>	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>	<4E5869C2.2040008@udel.edu>	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>	<20110829141440.2e2178c6@pitrou.net>	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>	<1314724786.3554.1.camel@localhost.localdomain>	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>	<4E5DEC35.4010404@g.nevcal.com>	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>	<4E5E82B0.4020302@g.nevcal.com>	<CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>	<4E5E8811.90600@g.nevcal.com>	<CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com>
	<CAP7+vJLM+4zJRJkkZrQLC0KONFEjnHxQfCbqAo48OKcNLLPt9w@mail.gmail.com>
Message-ID: <j3mmvn$k0j$1@dough.gmane.org>

>> [...] some text drawing engines draw decomposed characters ("o"
>> followed by " ?" -> "o?") differently compared to their composite
>> equivalents ("?") and this may be perceived as better or worse. I'd
>> like to offer an option to replace some decomposed characters with
>> their composite equivalent before drawing but since other characters
>> may look worse, I don't want to do a full normalization.
> 
> Isn't this an issue properly solved by various normal forms?

I think he's rather describing the need for custom "abnormal forms".

- Hagen


From nyamatongwe at gmail.com  Thu Sep  1 03:29:39 2011
From: nyamatongwe at gmail.com (Neil Hodgson)
Date: Thu, 1 Sep 2011 11:29:39 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJLM+4zJRJkkZrQLC0KONFEjnHxQfCbqAo48OKcNLLPt9w@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>
	<4E5E82B0.4020302@g.nevcal.com>
	<CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>
	<4E5E8811.90600@g.nevcal.com>
	<CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com>
	<CAP7+vJLM+4zJRJkkZrQLC0KONFEjnHxQfCbqAo48OKcNLLPt9w@mail.gmail.com>
Message-ID: <CAMLCkUd7NxrFk25G0M67pQqA3DhGp5oBD2-w-VCewkTV3Xp9KQ@mail.gmail.com>

Guido van Rossum:

> On Wed, Aug 31, 2011 at 5:58 PM, Neil Hodgson <nyamatongwe at gmail.com> wrote:
>> [...] some text drawing engines draw decomposed characters ("o"
>> followed by " ?" -> "o?") differently compared to their composite
>> equivalents ("?") and this may be perceived as better or worse. I'd
>> like to offer an option to replace some decomposed characters with
>> their composite equivalent before drawing but since other characters
>> may look worse, I don't want to do a full normalization.
>
> Isn't this an issue properly solved by various normal forms?

   No, since normalization of all cases may actually lead to worse
visuals in some situations. A potential reason for drawing decomposed
characters differently is that more room may be allocated for the
generic condition where a character may be combined with a wide
variety of accents compared with combining it with a specific accent.

   Here is an example on Windows drawing composite and decomposed
forms to show the types of difference often encountered.
http://scintilla.org/Composite.png
   Now, this particular example displays both forms quite reasonably
so would not justify special processing but I have seen on other
platforms and earlier versions of Windows where the umlaut in the
decomposed form is displaced to the right even to the extent of
disappearing under the next character. In the example, the decomposed
'o' is shorter and lighter and the umlauts are round instead of
square.

   Neil

From guido at python.org  Thu Sep  1 04:51:35 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 31 Aug 2011 19:51:35 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAMLCkUd7NxrFk25G0M67pQqA3DhGp5oBD2-w-VCewkTV3Xp9KQ@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>
	<4E5E82B0.4020302@g.nevcal.com>
	<CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>
	<4E5E8811.90600@g.nevcal.com>
	<CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com>
	<CAP7+vJLM+4zJRJkkZrQLC0KONFEjnHxQfCbqAo48OKcNLLPt9w@mail.gmail.com>
	<CAMLCkUd7NxrFk25G0M67pQqA3DhGp5oBD2-w-VCewkTV3Xp9KQ@mail.gmail.com>
Message-ID: <CAP7+vJ+=DHrwus9a213_QoKNVKxirdJzJWU4FjQ_tiuNTK8m5A@mail.gmail.com>

On Wed, Aug 31, 2011 at 6:29 PM, Neil Hodgson <nyamatongwe at gmail.com> wrote:
> Guido van Rossum:
>
>> On Wed, Aug 31, 2011 at 5:58 PM, Neil Hodgson <nyamatongwe at gmail.com> wrote:
>>> [...] some text drawing engines draw decomposed characters ("o"
>>> followed by " ?" -> "o?") differently compared to their composite
>>> equivalents ("?") and this may be perceived as better or worse. I'd
>>> like to offer an option to replace some decomposed characters with
>>> their composite equivalent before drawing but since other characters
>>> may look worse, I don't want to do a full normalization.
>>
>> Isn't this an issue properly solved by various normal forms?
>
> ? No, since normalization of all cases may actually lead to worse
> visuals in some situations. A potential reason for drawing decomposed
> characters differently is that more room may be allocated for the
> generic condition where a character may be combined with a wide
> variety of accents compared with combining it with a specific accent.

Ok, I thought there was also a form normalized (denormalized?) to
decomposed form. But I'll take your word.

> ? Here is an example on Windows drawing composite and decomposed
> forms to show the types of difference often encountered.
> http://scintilla.org/Composite.png
> ? Now, this particular example displays both forms quite reasonably
> so would not justify special processing but I have seen on other
> platforms and earlier versions of Windows where the umlaut in the
> decomposed form is displaced to the right even to the extent of
> disappearing under the next character. In the example, the decomposed
> 'o' is shorter and lighter and the umlauts are round instead of
> square.

I'm not sure it's a good idea to try and improve on the font using
such a hack. But I won't deny you have the right. :-)

-- 
--Guido van Rossum (python.org/~guido)

From v+python at g.nevcal.com  Thu Sep  1 06:40:55 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Wed, 31 Aug 2011 21:40:55 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>
	<4E5E82B0.4020302@g.nevcal.com>
	<CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>
	<4E5E8811.90600@g.nevcal.com>
	<CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com>
Message-ID: <4E5F0CD7.3030509@g.nevcal.com>

On 8/31/2011 5:58 PM, Neil Hodgson wrote:
> Glenn Linderman:
>
>> That said, regexp, or some sort of cursor on a string, might be a workable
>> solution.  Will it have adequate performance?  Perhaps, at least for some
>> applications.  Will it be as conceptually simple as indexing an array of
>> graphemes?  No.  Will it ever reach the efficiency of indexing an array of
>> graphemes? No.  Does that matter? Depends on the application.
>     Using an iterator for cluster access is a common technique
> currently. For example, with the Pango text layout and drawing
> library, you may create a PangoLayoutIter over a text layout object
> (which contains a UTF-8 string along with formatting information) and
> iterate by clusters by calling pango_layout_iter_next_cluster. Direct
> access to clusters by index is not as useful in this domain as access
> by pixel positions - for example to examine the portion of a layout
> visible in a window.
>
>     http://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-get-iter
>     In this API, 'index' is used to refer to a byte index into UTF-8,
> not a character or cluster index.

I agree that different applications may have different needs for 
different types of indexes to various starting points in a large 
string.  Where a custom index is required, a standard index may not be 
needed.

>     One of the benefits of iterator access to text is that many
> different iterators can be built without burdening the implementation
> object with extra memory costs as would be likely with techniques that
> build indexes into the representation.

How many different iterators into the same text would be concurrently 
needed by an application?  And why?  Seems like if it is dealing with 
text at the level of grapheme clusters, it needs that type of iterator.  
Of course, if it does I/O it needs codec access, but that is by nature 
sequential from the starting point to the end point.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/ef69cd9c/attachment.html>

From stephen at xemacs.org  Thu Sep  1 09:13:03 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 01 Sep 2011 16:13:03 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
Message-ID: <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>

Where I cut your words, we are in 100% agreement.  (FWIW :-)

Guido van Rossum writes:
 > On Tue, Aug 30, 2011 at 11:03 PM, Stephen J. Turnbull
 > <stephen at xemacs.org> wrote:

 > > Well, that's why I wrote "intended to be suggestive". ?The Unicode
 > > Standard does not specify at all what the internal representation of
 > > characters may be, it only specifies what their external behavior must
 > > be when two processes communicate. ?(For "process" as used in the
 > > standard, think "Python modules" here, since we are concerned with the
 > > problems of folks who develop in Python.) ?When observing the behavior
 > > of a Unicode process, there are no UTF-16 arrays or UTF-8 arrays or
 > > even UTF-32 arrays; only arrays of characters.
 > 
 > Hm, that's not how I would read "process". IMO that is an
 > intentionally vague term,

I agree.  I'm sorry that I didn't make myself clear.  The reason I
read "process" as "module" is that some modules of Python, and
therefore Python as a whole, cannot conform to the Unicode standard.
Eg, anything that inputs or outputs bytes.  Therefore only "modules"
and "types" can be asked to conform.  (I don't think it makes sense to
ask anything lower level to conform.  See below where I comment on
your .lower() example.)

What I am advocating (for the long term) is provision of *one* module
(or type) such that if the text processing done by the application is
done entirely in terms of this module (type), it will conform (to some
specified degree, chosen to balance user wants with implementation and
support costs).  It may be desireable to provide others for
sufficiently important particular use cases, but at present I see a
clear need for *one*.  Unicode conformance is going to be a common
requirement for apps used by global enterprises.

I oppose trying to make str into that type.  We need str, just as it
is, for many reasons.

 > and we are free to decide how to interpret it. I don't think it
 > will work very well to define a process as a Python module; what
 > about Python modules that agree about passing along array of code
 > units (or streams of UTF-8, for that matter)?

Certainly a group of cooperating modules could form a conforming
process, just as you describe it for one example.  The "one module"
mentioned above need not implement everything internally, but it would
take responsiblity for providing guarantees (eg, unit tests) of
whatever conformance claims it makes.

 > > Thus, according to the rules of handling a UTF-16 stream, it is an
 > > error to observe a lone surrogate or a surrogate pair that isn't a
 > > high-low pair (Unicode 6.0, Ch. 3 "Conformance", requirements C1 and
 > > C8-C10). ?That's what I mean by "can't tell it's UTF-16".
 > 
 > But if you can observe (valid) surrogate pairs it is still UTF-16.

In the concrete implementation I have in mind, surrogate pairs are
represented by a str containing 2 code units.  But in that case
s[i][1] is an error, and s[i][0] == s[i].  print(s[i][0]) and
print(s[i]) will print the same character to the screen.  If you
decode it to bytes, well, it's not a str any more so what have you
proved?  Ie, what you will see is *code points* not in the BMP.

You don't have to agree that such "surrogate containment" behavior is
so valuable as I think it is, but that's what I have in mind as one
requirement for a "conforming implementation of UTF-16".

 > At the same time I think it would be useful if certain string
 > operations like .lower() worked in such a way that *if* the input were
 > valid UTF-16, *then* the output would also be, while *if* the input
 > contained an invalid surrogate, the result would simply be something
 > that is no worse (in particular, those are all mapped to
 > themselves).

I don't think that it's a good idea to go for conformance at the
method level.  It would be a feature for apps that don't claim full
conformance because they nevertheless give good results in more cases.
The downside will be Python apps using str that will pass conformance
tests written for, say Western Europe, but end users in Kuwait and
Kuala Lumpur will report bugs.

 > An analogy is actually found in .lower() on 8-bit strings in Python 2:
 > it assumes the string contains ASCII, and non-ASCII characters are
 > mapped to themselves. If your string contains Latin-1 or EBCDIC or
 > UTF-8 it will not do the right thing. But that doesn't mean strings
 > cannot contain those encodings, it just means that the .lower() method
 > is not useful if they do. (Why ASCII? Because that is the system
 > encoding in Python 2.)

Sure.  I think that approach is fine for str, too, except that I would
hope it looks up BMP base characters in the case-mapping database.
The fact is that with very few exceptions non-BMP characters are going
to be symbols (mathematical operators and emoticons, for example).
This is good enough, except when it's not---but when it's not, only
100% conformance is really a reasonable target.  IMO, of course.

 > I think we should just document how it behaves and not get hung up on
 > what it is called. Mentioning UTF-16

If you also say, "this type can represent all characters in Unicode,
as well as certain non-characters", why mention UTF-16 at all?

 > Let's call those things graphemes (Tom C's term, I quite like leaving
 > "character" ambiguous)

OK, but those definitions need to be made clear, as "grapheme cluster"
and "combined character" are defined in the Unicode standard, and in
fact mean slightly different things from each other.

 > -- they are sequences of multiple code points that represent a
 > single "visual squiggle" (the kind of thing that you'd want to be
 > swappable in vim with "xp" :-). I agree that APIs are needed to
 > manipulate (match, generate, validate, mutilate, etc.)  things at
 > the grapheme level. I don't agree that this means a separate data
 > type is required.

Clear enough.

 > There are ever-larger units of information encoded in text strings,
 > with ever farther-reaching (and more vague) requirements on valid
 > sequences. Do you want to have a data type that can represent (only
 > valid) words in a language? Sentences? Novels?

No, and I can tell you why!  The difference between characters and
words is much more important than that between code point and grapheme
cluster for most users and the developers who serve them.  Even small
children recognize typographical ligatures as being composite objects,
while at least this Spanish-as-a-second-language learner was taught
that `?' is an atomic character represented by a discontiguous glyph,
like `i', and it is no more related to `n' than `m' is.  Users really
believe that characters are atomic.  Even in the cases of Han
characters and Hangul, users think of the characters as being
"atomic," but in the sense of Bohr rather than that of Democritus.

I think the situation for text processing is analogous to chemistry
where the atom, with a few fairly gross properties (the outer electron
orbitals) is the fundamental unit, not the elementary particles like
electrons and protons and structures like inner orbitals.  Sure, there
are higher order structures like molecules, phases, and crystals, but
it is elements that have the most regular and simply described
behavior for the chemist, and it does not become any simpler for the
chemist if you decompose the atom.  The composed character or grapheme
cluster is the analogue of the atom for most processing at the level
of "text".  The only real exceptions I can imagine are in the domain
of linguistics.

 > I think that at this point in time the best we can do is claim that
 > Python (the language standard) uses either 16-bit code units or 21-bit
 > code points in its string datatype, and that, thanks to PEP 393,
 > CPython 3.3 and further will always use 21-bit code points (but Jython
 > and IronPython may forever use their platform's native 16-bit code
 > unit representing string type). And then we add APIs that can be used
 > everywhere to look for code points (even if the string contains code
 > points), graphemes, or larger constructs. I'd like those APIs to be
 > designed using a garbage-in-garbage-out principle, where if the input
 > conforms to some Unicode requirement, the output does too, but if the
 > input doesn't, the output does what makes most sense. Validation is
 > then limited to codecs, and optional calls.

Clear enough.  I disagree that that will be enough for constructing
large-scale Unicode-conformant applications.  Somebody is going to
have to produce batteries for those applications, and I think they
should be included in Python.  I agree that it's proper that I and
those who think the same way take responsibility for writing and
implementing a PEP.

 > If you index or slice a string, or create a string from chr() of a
 > surrogate or from some other value that the Unicode standard considers
 > an illegal code point, you better know what you are doing.

I think that's like asking a toddler to know that the stove is hot.
The consequences for the toddler of her ignorance are much greater,
but the informational requirement is equally stringent.  Of course
application writers are adults who could be asked to learn, but
economically I think it make a lot more sense to include those
batteries.  IMHO YMMV, obviously.

 > I want chr(i) to be valid for all values of i in range(2**21),

I quite agree (ie, for str).  Thus I perceive a need for another type.


From stephen at xemacs.org  Thu Sep  1 09:59:19 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 01 Sep 2011 16:59:19 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5E882C.1050006@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJJtYZ8vspUimoMh1j6ye6SbfT-eT-YPMG3htU+2NPSNXA@mail.gmail.com>
	<4E5E882C.1050006@g.nevcal.com>
Message-ID: <87pqjkk814.fsf@uwakimon.sk.tsukuba.ac.jp>

Glenn Linderman writes:

 > We can either artificially constrain ourselves to minor tweaks of
 > the legal conforming bytestreams,

It's not artificial.  Having the internal representation be the same
as a standard encoding is very useful for a large number of minor
usages (urgently saving buffers in a text editor that knows its
internal state is inconsistent, viewing strings in the debugger, PEP
393-style space optimization is simpler if text properties are
out-of-band, etc).

 > or we can invent a representation (whether called str or something
 > else) that is useful and efficient in practice.

Bring on the practice, then.  You say that a bit to identify lone
surrogates might be useful or efficient.  In what application?  How
much time or space does it save?  You say that a bit to cache a
property might be useful or efficient.  In what application?  Which
properties?  Are those properties a set fixed by the language, or
would some bits be available for application-specific property
caching?  How much time or space does that save?

What are the costs to applications that don't want the cache?  How is
the bit-cache affected by PEP 393?

I know of no answers (none!) to those questions that favor
introduction of a bit-cache representation now.  And those bits aren't
going anywhere; it will always be possible to use a "wide" build and
change the representation later, if the optimization is valuable
enough.  Now, I'm aware that my experience is limited to the
implementations of one general-purpose language (Emacs Lisp) of
retricted applicability.  But its primary use *is* in text processing,
so I'm moderately expert.

*Moderately*.  Always interested in learning more, though.  If you
know of relevant use cases, I'm listening!  Even if Guido doesn't find
them convincing for Python, we might find them interesting at XEmacs.

From nyamatongwe at gmail.com  Thu Sep  1 10:05:41 2011
From: nyamatongwe at gmail.com (Neil Hodgson)
Date: Thu, 1 Sep 2011 18:05:41 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5F0CD7.3030509@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>
	<4E5E82B0.4020302@g.nevcal.com>
	<CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>
	<4E5E8811.90600@g.nevcal.com>
	<CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com>
	<4E5F0CD7.3030509@g.nevcal.com>
Message-ID: <CAMLCkUe7d4RN-7yUQA8njJrxRvNGHbdqyAnXa-pviXSzOdvstw@mail.gmail.com>

Glenn Linderman:

> How many different iterators into the same text would be concurrently needed
> by an application?? And why??Seems like if it is dealing with text at the
> level of grapheme clusters, it needs that type of iterator.? Of course, if
> it does I/O it needs codec access, but that is by nature sequential from the
> starting point to the end point.

   I would expect that there would mostly be a single iterator into a
string but can imagine scenarios in which multiple iterators may be
concurrently active and that these could be of different types. For
example, say we wanted to search for each code point in a text that
fails some test (such as being a member of a set of unwanted vowel
diacritics) and then display that failure in context with its
surrounding text of up to 30 graphemes either side.

   Neil

From turnbull at sk.tsukuba.ac.jp  Thu Sep  1 10:33:50 2011
From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull)
Date: Thu, 01 Sep 2011 17:33:50 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5E8840.4080600@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<87vctdkbzh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5E8840.4080600@g.nevcal.com>
Message-ID: <87obz4k6fl.fsf@uwakimon.sk.tsukuba.ac.jp>

Glenn Linderman writes:

 > I found your discussion of streams versus arrays, as separate concepts 
 > related to Unicode, along with Terry's bisect indexing implementation, 
 > to rather inspiring.  Just because Unicode defines streams of codeunits 
 > of various sizes (UTF-8, UTF-16, UTF-32) to represent characters when 
 > processes communicate and for storage (which is one way processes 
 > communicate), that doesn't imply that the internal representation of 
 > character strings in a programming language must use exactly that 
 > representation.

That is true, and Unicode is *very* careful to define its requirements
so that is true.  That doesn't mean using an alternative
representation is an improvement, though.

 > I'm unaware of any current Python implementation that has chosen to
 > use UTF-8 as the internal representation of character strings (I'm
 > also aware Perl has made that choice), yet UTF-8 is one of the
 > commonly recommend character representations on the Linux platform,
 > from what I read.

There are two reasons for that.  First, widechar representations are
right out for anything related to the file system or OS, unless you
are prepared to translate before passing to the OS.  If you use UTF-8,
then asking the user to use a UTF-8 locale to communicate with your
app is a plausible way to eliminate any translation in your app.  (The
original moniker for UTF-8 was UTF-FSS, where FSS stands for "file
system safe.")

Second, much text processing is stream-oriented and one-pass.  In
those cases, the variable-width nature of UTF-8 doesn't cost you
anything.  Eg, this is why the common GUIs for Unix (X.org, GTK+, and
Qt) either provide or require UTF-8 coding for their text.  It costs
*them* nothing and is file-system-safe.

 > So in that sense, Python has rejected the idea of using the
 > "native" or "OS configured" representation as its internal
 > representation.

I can't agree with that characterization.  POSIX defines the concept
of *locale* precisely because the "native" representation of text in
Unix is ASCII.  Obviously that won't fly, so they solved the problem
in the worst possible way<wink/>:  they made the representation
variable!

It is the *variability* of text representation that Python rejects,
just as Emacs and Perl do.  They happen to have chosen six different
representations.[1]

 > So why, then, must one choose from a repertoire of Unicode-defined
 > stream representations if they don't meet the goal of efficient
 > length, indexing, or slicing operations on actual characters?

One need not.  But why do anything else?  It's not like the authors of
that standard paid no attention to various concerns about efficiency
and backward compatibility!  That's the question that you have not
answered, and I am presently lacking in any data that suggests I'll
ever need the facilities you propose.

Footnotes: 
[1]  Emacs recently changed its mind.  Originally it used the
so-called MULE encoding, and now a different extension of UTF-8 from
Perl.  Of course, Python beats that, with narrow, wide, and now
PEP-393 representations!<wink />


From nyamatongwe at gmail.com  Thu Sep  1 10:53:28 2011
From: nyamatongwe at gmail.com (Neil Hodgson)
Date: Thu, 1 Sep 2011 18:53:28 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87obz4k6fl.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<87vctdkbzh.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5E8840.4080600@g.nevcal.com>
	<87obz4k6fl.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAMLCkUcUL4QkEfjZf8euA+kX2vxNj7TR6rkTR+EATqsO2KgAHw@mail.gmail.com>

Stephen J. Turnbull:

> ... ?Eg, this is why the common GUIs for Unix (X.org, GTK+, and
> Qt) either provide or require UTF-8 coding for their text.

   Qt uses UTF-16 for its basic QString type. While QString is mostly
treated as a black box which you can create from input buffers in any
encoding, the only encoding allowed for a contents-by-reference
QString (QString::fromRawData) is UTF-16.
http://doc.qt.nokia.com/latest/qstring.html#fromRawData

   Neil

From stephen at xemacs.org  Thu Sep  1 11:15:50 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 01 Sep 2011 18:15:50 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5F0CD7.3030509@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>
	<4E5E82B0.4020302@g.nevcal.com>
	<CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>
	<4E5E8811.90600@g.nevcal.com>
	<CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com>
	<4E5F0CD7.3030509@g.nevcal.com>
Message-ID: <87k49sk4hl.fsf@uwakimon.sk.tsukuba.ac.jp>

Glenn Linderman writes:

 > How many different iterators into the same text would be concurrently 
 > needed by an application?  And why?

A WYSIWYG editor for structured text (TeX, HTML) might want two (at
least), one for the "source" window and one for the "rendered" window.
One might want to save the state of the iterators (if that's possible)
and cache it as one moves the "window" forward to make short backward
motion fast, giving you two (or four, etc) more.

 > Seems like if it is dealing with text at the level of grapheme
 > clusters, it needs that type of iterator.  Of course, if it does
 > I/O it needs codec access, but that is by nature sequential from
 > the starting point to the end point.

`save-region' ?  `save-text-remove-markup' ?

From v+python at g.nevcal.com  Thu Sep  1 11:20:59 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Thu, 01 Sep 2011 02:20:59 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87k49sk4hl.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>
	<4E5E82B0.4020302@g.nevcal.com>
	<CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>
	<4E5E8811.90600@g.nevcal.com>
	<CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com>
	<4E5F0CD7.3030509@g.nevcal.com>
	<87k49sk4hl.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4E5F4E7B.4070200@g.nevcal.com>

On 9/1/2011 2:15 AM, Stephen J. Turnbull wrote:
> Glenn Linderman writes:
>
>   >  How many different iterators into the same text would be concurrently
>   >  needed by an application?  And why?
>
> A WYSIWYG editor for structured text (TeX, HTML) might want two (at
> least), one for the "source" window and one for the "rendered" window.
> One might want to save the state of the iterators (if that's possible)
> and cache it as one moves the "window" forward to make short backward
> motion fast, giving you two (or four, etc) more.

Sure.  But those are probably all the same type of iterators ? probably 
(since they are WYSIWYG) dealing with multi-codepoint characters 
(Guido's recent definition of grapheme, which seems to subsume both 
grapheme clusters and composed characters).

Hence all of  them would be using/requiring the same sort of 
representation, index, analysis, or some combination of those.

>   >  Seems like if it is dealing with text at the level of grapheme
>   >  clusters, it needs that type of iterator.  Of course, if it does
>   >  I/O it needs codec access, but that is by nature sequential from
>   >  the starting point to the end point.
>
> `save-region' ?  `save-text-remove-markup' ?

Yes, save-region sounds like exactly what I was speaking of.  
save-text-remove-markup I would infer needs to process the text to 
remove the markup characters... since you used TeX and HTML as examples, 
markup is text, not binary (which would be a different problem).  Since 
the TeX and HTML markup is mostly ASCII, markup removal (or more likely, 
text extraction) could be performed via either a grapheme iterator, or a 
codepoint iterator, or even a code unit iterator.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110901/95124888/attachment.html>

From v+python at g.nevcal.com  Thu Sep  1 11:55:22 2011
From: v+python at g.nevcal.com (Glenn Linderman)
Date: Thu, 01 Sep 2011 02:55:22 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87pqjkk814.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJJtYZ8vspUimoMh1j6ye6SbfT-eT-YPMG3htU+2NPSNXA@mail.gmail.com>
	<4E5E882C.1050006@g.nevcal.com>
	<87pqjkk814.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4E5F568A.4020301@g.nevcal.com>

On 9/1/2011 12:59 AM, Stephen J. Turnbull wrote:
> Glenn Linderman writes:
>
>   >  We can either artificially constrain ourselves to minor tweaks of
>   >  the legal conforming bytestreams,
>
> It's not artificial.  Having the internal representation be the same
> as a standard encoding is very useful for a large number of minor
> usages (urgently saving buffers in a text editor that knows its
> internal state is inconsistent, viewing strings in the debugger, PEP
> 393-style space optimization is simpler if text properties are
> out-of-band, etc).

saving buffers urgently when the internal state is inconsistent sounds 
like carefully preserving a bug.  Windows 7 64-bit on one of my 
computers happily crashes several times a day when it detects 
inconsistent internal state... under the theory, I guess, that losing 
work is better than saving bad work.  You sound the opposite.

I'm actually very grateful that Firefox and emacs recover gracefully 
from Windows crashes, and I lose very little data from the crashes, but 
cannot recommend Windows 7 (this machine being my only experience with 
it) for stability.

In any case, the operations you mention still require the data to be 
processed, if ever so slightly, and I'll admit that a more complex 
representation would require a bit more processing.  Not clear that it 
would be huge or problematical for these cases.

Except, I'm not sure how PEP 393 space optimization fits with the other 
operations.  It may even be that an application-wide complex-grapheme 
cache would save significant space, although if it uses high-bits in a 
string representation to reference the cache, PEP 393 would jump 
immediately to something > 16 bits per grapheme... but likely would 
anyway, if complex-graphemes are in the data stream.

>   >  or we can invent a representation (whether called str or something
>   >  else) that is useful and efficient in practice.
>
> Bring on the practice, then.  You say that a bit to identify lone
> surrogates might be useful or efficient.  In what application?  How
> much time or space does it save?

I didn't attribute any efficiency to flagging lone surrogates (BI-5).  
Since Windows uses a non-validated UCS-2 or UTF-16 character type, any 
Python program that obtains data from Windows APIs may be confronted 
with lone surrogates or inappropriate combining characters at any time.  
Round-tripping that data seems useful, even though the data itself may 
not be as useful as validated Unicode characters would be.  Accidentally 
combining the characters due to slicing and dicing the data, and doing 
normalizations, or what not, would not likely be appropriate.  However, 
returning modified forms of it to Windows as UCS-2 or UTF-16 data may 
still cause other applications to later accidentally combine the 
characters, if the modifications juxtaposed things to make them look 
reasonably, even if accidentally.  If intentionally, of course, the bit 
could be turned off.  This exact sort of problem with non-validated 
UTF-8 bytes was addressed already in Python, mostly for Linux, allowing 
round-tripping of the byte stream, even though it is not valid.  BI-6 
suggests a different scheme for that, without introducing lone 
surrogates (which might accidentally get combined with other lone 
surrogates).

> You say that a bit to cache a
> property might be useful or efficient.  In what application?  Which
> properties?  Are those properties a set fixed by the language, or
> would some bits be available for application-specific property
> caching?  How much time or space does that save?

The brainstorming ideas I presented were just that... ideas.  And they 
were independent.  And the use of many high-order bits for properties 
was one of the independent ones.  When I wrote that one, I was assuming 
a UTF-32 representation (which wastes 11 bits of each 32).  One thing I 
did have in mind, with the high-order bits, for that representation, was 
to flag the start or end or middle of the codes that are included in a 
grapheme.  That would be redundant with some of the Unicode codepoint 
property databases, if I understand them properly... whether it would 
make iterators enough more efficient to be worth the complexity would 
have to be benchmarked.  After writing all those ideas down, I actually 
preferred some of the others, that achieved O(1) real grapheme indexing, 
rather than caching character properties.

> What are the costs to applications that don't want the cache?  How is
> the bit-cache affected by PEP 393?

If it is a separate type from str, then it costs nothing except the 
extra code space to implement the cache for those applications that do 
want it... most of which wouldn't be loaded for applications that don't, 
if done as a module or C extension.

> I know of no answers (none!) to those questions that favor
> introduction of a bit-cache representation now.  And those bits aren't
> going anywhere; it will always be possible to use a "wide" build and
> change the representation later, if the optimization is valuable
> enough.  Now, I'm aware that my experience is limited to the
> implementations of one general-purpose language (Emacs Lisp) of
> retricted applicability.  But its primary use *is* in text processing,
> so I'm moderately expert.
>
> *Moderately*.  Always interested in learning more, though.  If you
> know of relevant use cases, I'm listening!  Even if Guido doesn't find
> them convincing for Python, we might find them interesting at XEmacs.

OK... ignore the bit-cache idea (BI-1), and reread the others without 
having your mind clogged with that one, and see if any of them make 
sense to you then.  But you may be too biased by the "minor" needs of 
keeping the internal representation similar to the stream representation 
to see any value in them.  I rather like BI-2, since it allow O(1) 
indexing of graphemes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110901/84209aa2/attachment.html>

From ned at nedbatchelder.com  Thu Sep  1 13:29:45 2011
From: ned at nedbatchelder.com (Ned Batchelder)
Date: Thu, 01 Sep 2011 07:29:45 -0400
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
Message-ID: <4E5F6CA9.1080501@nedbatchelder.com>

On 8/30/2011 4:41 PM, stefan brunthaler wrote:
>> Ok, there there's something else you haven't told us. Are you saying
>> that the original (old) bytecode is still used (and hence written to
>> and read from .pyc files)?
>>
> Short answer: yes.
> Long answer: I added an invocation counter to the code object and keep
> interpreting in the usual Python interpreter until this counter
> reaches a configurable threshold. When it reaches this threshold, I
> create the new instruction format and interpret with this optimized
> representation. All the macros look exactly the same in the source
> code, they are just redefined to use the different instruction format.
> I am at no point serializing this representation or the runtime
> information gathered by me, as any subsequent invocation might have
> different characteristics.
When the switchover to the new instruction format happens, what happens 
to sys.settrace() tracing?  Will it report the same sequence of line 
numbers?  For a small but important class of program executions, this is 
more important than speed.

--Ned.

> Best,
> --stefan

From cesare.di.mauro at gmail.com  Thu Sep  1 14:23:04 2011
From: cesare.di.mauro at gmail.com (Cesare Di Mauro)
Date: Thu, 1 Sep 2011 14:23:04 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <4E5F6CA9.1080501@nedbatchelder.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<4E5F6CA9.1080501@nedbatchelder.com>
Message-ID: <CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>

2011/9/1 Ned Batchelder <ned at nedbatchelder.com>

> When the switchover to the new instruction format happens, what happens to
> sys.settrace() tracing?  Will it report the same sequence of line numbers?
>  For a small but important class of program executions, this is more
> important than speed.
>
>  --Ned
>

A simple solution: when tracing is enabled, the new instruction format will
never be executed (and information tracking disabled as well).

Regards,
Cesare
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110901/d672b469/attachment.html>

From mark at hotpy.org  Thu Sep  1 14:31:12 2011
From: mark at hotpy.org (Mark Shannon)
Date: Thu, 01 Sep 2011 13:31:12 +0100
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>	<20110829231420.20c3516a@pitrou.net>	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>	<20110830025510.638b41d9@pitrou.net>
	<4E5CA1F0.2070005@v.loewis.de>	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>	<20110830193806.0d718a56@pitrou.net>	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>	<4E5F6CA9.1080501@nedbatchelder.com>
	<CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>
Message-ID: <4E5F7B10.3010001@hotpy.org>

Cesare Di Mauro wrote:
> 2011/9/1 Ned Batchelder <ned at nedbatchelder.com 
> <mailto:ned at nedbatchelder.com>>
> 
>     When the switchover to the new instruction format happens, what
>     happens to sys.settrace() tracing?  Will it report the same sequence
>     of line numbers?  For a small but important class of program
>     executions, this is more important than speed.
> 
>      --Ned
> 
> 
> A simple solution: when tracing is enabled, the new instruction format 
> will never be executed (and information tracking disabled as well).
> 
What happens if tracing is enabled *during* the execution of the new 
instruction format?
Some sort of deoptimisation will be required in order to recover the 
correct VM state.

Cheers,
Mark.

> Regards,
> Cesare
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org


From cesare.di.mauro at gmail.com  Thu Sep  1 14:38:19 2011
From: cesare.di.mauro at gmail.com (Cesare Di Mauro)
Date: Thu, 1 Sep 2011 14:38:19 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <4E5F7B10.3010001@hotpy.org>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<4E5F6CA9.1080501@nedbatchelder.com>
	<CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>
	<4E5F7B10.3010001@hotpy.org>
Message-ID: <CAP7v7k4CV3VEaE2BvCSBGZ=heSxYt20VzS1BGv7UXRi=_VJauA@mail.gmail.com>

2011/9/1 Mark Shannon <mark at hotpy.org>

> Cesare Di Mauro wrote:
>
>> 2011/9/1 Ned Batchelder <ned at nedbatchelder.com <mailto:
>> ned at nedbatchelder.com>>
>>
>>
>>    When the switchover to the new instruction format happens, what
>>    happens to sys.settrace() tracing?  Will it report the same sequence
>>    of line numbers?  For a small but important class of program
>>    executions, this is more important than speed.
>>
>>     --Ned
>>
>>
>> A simple solution: when tracing is enabled, the new instruction format
>> will never be executed (and information tracking disabled as well).
>>
>>  What happens if tracing is enabled *during* the execution of the new
> instruction format?
> Some sort of deoptimisation will be required in order to recover the
> correct VM state.
>
> Cheers,
> Mark.
>

Sure. I don't think that the regular ceval.c loop will be "dropped" when
executing the new instruction format, so we can "intercept" a change like
this using the "why" variable, for example, or something similar that is
normally used to break the regular loop execution.

Anyway, we need to take a look at the code.

Cheers,
Cesare
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110901/8cdaab9b/attachment.html>

From hagen at zhuliguan.net  Thu Sep  1 17:30:10 2011
From: hagen at zhuliguan.net (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=)
Date: Thu, 01 Sep 2011 11:30:10 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJ+=DHrwus9a213_QoKNVKxirdJzJWU4FjQ_tiuNTK8m5A@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<4E5869C2.2040008@udel.edu>	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>	<20110829141440.2e2178c6@pitrou.net>	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>	<1314724786.3554.1.camel@localhost.localdomain>	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>	<4E5DEC35.4010404@g.nevcal.com>	<CAP7+vJL-9vxDDrXEDA9Ussk5Eso_fTaRAKHGghnBabr8F3yT2Q@mail.gmail.com>	<4E5E82B0.4020302@g.nevcal.com>	<CAP7+vJ+OVGZtAh8qf4hZ0XWSx5QOWtNSNFFq73FAT3G6Dev23A@mail.gmail.com>	<4E5E8811.90600@g.nevcal.com>	<CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com>	<CAP7+vJLM+4zJRJkkZrQLC0KONFEjnHxQfCbqAo48OKcNLLPt9w@mail.gmail.com>	<CAMLCkUd7NxrFk25G0M67pQqA3DhGp5oBD2-w-VCewkTV3Xp9KQ@mail.gmail.com>
	<CAP7+vJ+=DHrwus9a213_QoKNVKxirdJzJWU4FjQ_tiuNTK8m5A@mail.gmail.com>
Message-ID: <j3o8bn$sqm$1@dough.gmane.org>

> Ok, I thought there was also a form normalized (denormalized?) to
> decomposed form. But I'll take your word.

If I understood the example correctly, he needs a mixed form, with some
characters decomposed and some composed (depending on which one looks
better in the given font). I agree that this sound more like a font
problem, but it's a wide spread font problem and it may be necessary to
address it in an application.

But this is only one example of why an application-specific concept of
graphemes different from the Unicode-defined normalized forms can be
useful. I think the very concept of a grapheme is context, language, and
culture specific. For example, in Chinese Pinyin it would be very
natural to write tone marks with composing diacritics (i.e. in
decomposed form). But then you have the vowel "?" and it would be
strange to decompose it into an "u" and combining diaeresis. So
conceptually the most sensible representation of "l?" would be neither
the composed not the decomposed normal form, and depending on its needs
an application might want to represent it in the mixed form (composing
the diaeresis with the "u", but leaving the grave accent separate).

There must be many more examples where the conceptual context determines
the right composition, like for "?", which is Spanish is certainly a
grapheme, but in mathematics might be better represented as n-tilde. The
bottom line is that, while an array of Unicode code points is certainly
a generally useful data type (and PEP 393 is a great improvement in this
regard), an array of graphemes carries many subtleties and may not be
nearly as universal. Support in the spirit of unicodedata's
normalization function etc. is certainly a good thing, but we shouldn't
assume that everyone will want Python to do their graphemes for them.

- Hagen


From guido at python.org  Thu Sep  1 17:45:14 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 Sep 2011 08:45:14 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>

On Thu, Sep 1, 2011 at 12:13 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Where I cut your words, we are in 100% agreement. ?(FWIW :-)

Not quite the same here, but I don't feel the need to have the last
word. Most of what you say makes sense, in some cases we'll quibble
later, but there are a few points where I have something to add:

> No, and I can tell you why! ?The difference between characters and
> words is much more important than that between code point and grapheme
> cluster for most users and the developers who serve them. ?Even small
> children recognize typographical ligatures as being composite objects,

True -- in fact I didn't know that ff and ffl ligatures *existed*
until I learned about Unix troff.

> while at least this Spanish-as-a-second-language learner was taught
> that `?' is an atomic character represented by a discontiguous glyph,
> like `i', and it is no more related to `n' than `m' is. ?Users really
> believe that characters are atomic. ?Even in the cases of Han
> characters and Hangul, users think of the characters as being
> "atomic," but in the sense of Bohr rather than that of Democritus.

Ah, I think this may very well be culture-dependent. In Holland there
are no Dutch words that use accented letters, but the accents are
known because there are a lot of words borrowed from French or German.
We (the Dutch) think of these as letters with accents and in fact we
think of the accents as modifiers that can be added to any letter (at
least I know that's how I thought about it -- perhaps I was also
influenced by the way one had to type those on a mechanical
typewriter). Dutch does have one native use of the umlaut (though it
has a different name, I forget which, maybe trema :-), when there are
two consecutive vowels that would normally be read as a special sound
(diphthong?). E.g. in "koe" (cow) the oe is two letters (not a single
letter formed of two distict shapes!) that mean a special sound
(roughly KOO). But in a word like "co?xistentie" (coexistence) the o
and e do not form the oe-sound, and to emphasize this to Dutch readers
(who believe their spelling is very logical :-), the official spelling
puts the umlaut on the e. This is definitely thought of as a separate
mark added to the e; ? is not a new letter. I have a feeling it's the
same way for the French and Germans, but I really don't know.
(Antoine? Georg?)

Finally, my guess is that the Spanish emphasis on ? as a separate
letter has to do with teaching how it has a separate position in the
localized collation sequence, doesn't it? I'm also curious if ? occurs
as a separate character on Spanish keyboards.

-- 
--Guido van Rossum (python.org/~guido)

From solipsis at pitrou.net  Thu Sep  1 18:03:47 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 01 Sep 2011 18:03:47 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
Message-ID: <1314893027.3617.12.camel@localhost.localdomain>

Le jeudi 01 septembre 2011 ? 08:45 -0700, Guido van Rossum a ?crit :
> This is definitely thought of as a separate
> mark added to the e; ? is not a new letter. I have a feeling it's the
> same way for the French and Germans, but I really don't know.
> (Antoine? Georg?)

Indeed, they are not separate "letters" (they are considered the same in
lexicographic order, and the French alphabet has 26 letters).

But I'm not sure how it's relevant, because you can't remove an accent
without most likely making a spelling error, or at least changing the
meaning. Accents are very much part of the language (while ligatures
like "ff" are not, they are a rendering detail). So I would consider
"?", "?", "?", etc. atomic characters for the purpose of processing
French text. And I don't see how a decomposed form could help an
application.

Regards

Antoine.


From guido at python.org  Thu Sep  1 18:31:53 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 Sep 2011 09:31:53 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <1314893027.3617.12.camel@localhost.localdomain>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<1314893027.3617.12.camel@localhost.localdomain>
Message-ID: <CAP7+vJK9vZE2+Uyhc_baUXfrCCvWEu7+h5J1GJTsJm6KEkuhtw@mail.gmail.com>

On Thu, Sep 1, 2011 at 9:03 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Le jeudi 01 septembre 2011 ? 08:45 -0700, Guido van Rossum a ?crit :
>> This is definitely thought of as a separate
>> mark added to the e; ? is not a new letter. I have a feeling it's the
>> same way for the French and Germans, but I really don't know.
>> (Antoine? Georg?)
>
> Indeed, they are not separate "letters" (they are considered the same in
> lexicographic order, and the French alphabet has 26 letters).
>
> But I'm not sure how it's relevant, because you can't remove an accent
> without most likely making a spelling error, or at least changing the
> meaning. Accents are very much part of the language (while ligatures
> like "ff" are not, they are a rendering detail). So I would consider
> "?", "?", "?", etc. atomic characters for the purpose of processing
> French text. And I don't see how a decomposed form could help an
> application.

The example given was someone who didn't agree with how a particular
font rendered those accented characters. I agree that's obscure
though.

I recall long ago that when the french wrote words in all caps they
would drop the accents, e.g. ECOLE. I even recall (through the mists
of time) observing this in Paris on public signs. Is this still the
convention? Maybe it only was a compromise in the time of Morse code?

-- 
--Guido van Rossum (python.org/~guido)

From solipsis at pitrou.net  Thu Sep  1 18:46:02 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 01 Sep 2011 18:46:02 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJK9vZE2+Uyhc_baUXfrCCvWEu7+h5J1GJTsJm6KEkuhtw@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<1314893027.3617.12.camel@localhost.localdomain>
	<CAP7+vJK9vZE2+Uyhc_baUXfrCCvWEu7+h5J1GJTsJm6KEkuhtw@mail.gmail.com>
Message-ID: <1314895562.3617.19.camel@localhost.localdomain>


> The example given was someone who didn't agree with how a particular
> font rendered those accented characters. I agree that's obscure
> though.
> 
> I recall long ago that when the french wrote words in all caps they
> would drop the accents, e.g. ECOLE. I even recall (through the mists
> of time) observing this in Paris on public signs. Is this still the
> convention? Maybe it only was a compromise in the time of Morse code?

I think it is tolerated, partly because typing support (on computers and
typewriters) has been weak. On a French keyboard, you have an "?" key,
but shifting it gives you "2", not "?". The latter can be obtained using
the Caps Lock key under Linux, but not under Windows.

(so you could also write ?ric's name "Eric", for example)

That said, most typographies nowadays seem careful to keep the accents
on uppercase letters (e.g. on book covers; AFAIR, road signs also keep
the accents, but I'm no driver).

Regards

Antoine.


From stefan_ml at behnel.de  Thu Sep  1 19:04:34 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 01 Sep 2011 19:04:34 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJK9vZE2+Uyhc_baUXfrCCvWEu7+h5J1GJTsJm6KEkuhtw@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>	<4E576793.2010203@v.loewis.de>
	<4E5824E1.9010101@udel.edu>	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>	<4E5869C2.2040008@udel.edu>	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>	<20110829141440.2e2178c6@pitrou.net>	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>	<1314724786.3554.1.camel@localhost.localdomain>	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>	<1314893027.3617.12.camel@localhost.localdomain>
	<CAP7+vJK9vZE2+Uyhc_baUXfrCCvWEu7+h5J1GJTsJm6KEkuhtw@mail.gmail.com>
Message-ID: <j3odv2$9f5$1@dough.gmane.org>

Guido van Rossum, 01.09.2011 18:31:
> On Thu, Sep 1, 2011 at 9:03 AM, Antoine Pitrou wrote:
>> Le jeudi 01 septembre 2011 ? 08:45 -0700, Guido van Rossum a ?crit :
>>> This is definitely thought of as a separate
>>> mark added to the e; ? is not a new letter. I have a feeling it's the
>>> same way for the French and Germans, but I really don't know.
>>> (Antoine? Georg?)
>>
>> Indeed, they are not separate "letters" (they are considered the same in
>> lexicographic order, and the French alphabet has 26 letters).

So does the German alphabet, even though that does not include "?", which 
basically descended from a ligature of the old German way of writing "sz", 
where "s" looked similar to an "f" and "z" had a low hanging tail.

IIRC, German Umlaut letters are lexicographically sorted according to their 
emergency replacement spelling ("?" -> "ae"), which is also sometimes used 
in all upper case words ("Gl?ck" -> "GLUECK"). I guess that's because 
Umlaut dots are harder to see on top of upper case letters. So, Latin-1 
byte value sorting always yields totally wrong results.

That aside, Umlaut letters are commonly considered separate letters, 
different from the undotted letters and also different from the replacement 
spellings. I, for one, always found the replacements rather weird and never 
got used to using them in upper case words. In any case, it's wrong to 
always use them, and it makes text harder to read.


>> But I'm not sure how it's relevant, because you can't remove an accent
>> without most likely making a spelling error, or at least changing the
>> meaning. Accents are very much part of the language (while ligatures
>> like "ff" are not, they are a rendering detail). So I would consider
>> "?", "?", "?", etc. atomic characters for the purpose of processing
>> French text. And I don't see how a decomposed form could help an
>> application.
>
> I recall long ago that when the french wrote words in all caps they
> would drop the accents, e.g. ECOLE. I even recall (through the mists
> of time) observing this in Paris on public signs. Is this still the
> convention?

Yes, and it's a huge problem when trying to pronounce last names. In 
French, you'd commonly write

LASTNAME, Firstname

and if LASTNAME happens to have accented letters, you'd miss them when 
reading that. I know a couple of French people who severely suffer from 
this, because the pronunciation of their name gets a totally different 
meaning without accents.

Stefan


From glyph at twistedmatrix.com  Thu Sep  1 19:15:45 2011
From: glyph at twistedmatrix.com (Glyph Lefkowitz)
Date: Thu, 1 Sep 2011 10:15:45 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<4E5F6CA9.1080501@nedbatchelder.com>
	<CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>
Message-ID: <112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com>


On Sep 1, 2011, at 5:23 AM, Cesare Di Mauro wrote:

> A simple solution: when tracing is enabled, the new instruction format will never be executed (and information tracking disabled as well).

Correct me if I'm wrong: doesn't this mean that no profiler will accurately be able to measure the performance impact of the new instruction format, and therefore one may get incorrect data when on is trying to make a CPU optimization for real-world performance?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110901/b57dffab/attachment.html>

From stefan_ml at behnel.de  Thu Sep  1 19:31:52 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 01 Sep 2011 19:31:52 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <1314895562.3617.19.camel@localhost.localdomain>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<4E576793.2010203@v.loewis.de>
	<4E5824E1.9010101@udel.edu>	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>	<4E5869C2.2040008@udel.edu>	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>	<20110829141440.2e2178c6@pitrou.net>	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>	<1314724786.3554.1.camel@localhost.localdomain>	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>	<1314893027.3617.12.camel@localhost.localdomain>	<CAP7+vJK9vZE2+Uyhc_baUXfrCCvWEu7+h5J1GJTsJm6KEkuhtw@mail.gmail.com>
	<1314895562.3617.19.camel@localhost.localdomain>
Message-ID: <j3ofi9$kq0$1@dough.gmane.org>

Antoine Pitrou, 01.09.2011 18:46:
> AFAIR, road signs also keep the accents, but I'm no driver

Right, I noticed that, too. That's certainly not uncommon. I think it's 
mostly because of local pride (after all, the road signs are all that many 
drivers ever see of a city), but sometimes also because it can't be helped 
when the name gets a different meaning without accents. People just cause 
too many accidents when they burst out laughing while entering a city by car.

Stefan


From guido at python.org  Thu Sep  1 19:40:00 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 Sep 2011 10:40:00 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>
	<112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com>
Message-ID: <CAP7+vJ++iZUiQvzyR28aH9ugwPQPcnOcUC8rih4ebG9R8B0pmA@mail.gmail.com>

On Thu, Sep 1, 2011 at 10:15 AM, Glyph Lefkowitz
<glyph at twistedmatrix.com> wrote:
>
> On Sep 1, 2011, at 5:23 AM, Cesare Di Mauro wrote:
>
> A simple solution: when tracing is enabled, the new instruction format will
> never be executed (and information tracking disabled as well).
>
> Correct me if I'm wrong: doesn't this mean that no profiler will accurately
> be able to measure the performance impact of the new instruction format, and
> therefore one may get incorrect data when on is trying to make a CPU
> optimization for real-world performance?

Well, profilers already skew results by adding call overhead. But
tracing for debugging and profiling don't do exactly the same thing:
debug tracing stops at every line, but profiling only executes hooks
at the start and end of a function(*). So I think the function body
could still be executed using the new format (assuming this is turned
on/off per code object anyway).

(*) And whenever a generator yields or is resumed. I consider that an
annoying bug though, just as the debugger doesn't do the right thing
with yield -- there's no way to continue until the yielding generator
is resumed short of setting a manual breakpoint on the next line.

-- 
--Guido van Rossum (python.org/~guido)

From drsalists at gmail.com  Thu Sep  1 19:56:32 2011
From: drsalists at gmail.com (Dan Stromberg)
Date: Thu, 1 Sep 2011 10:56:32 -0700
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
 support in 3.3)
In-Reply-To: <CAP7+vJJJHnqXh5Fk5B37NYA_n3JhLEo6B-etK0TZ-cFe543sog@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
	<4E5C01E4.2050106@canterbury.ac.nz>
	<CAP7+vJJiXdSFCGhXJH1CVN3jwP=GqyUCqsvwSh6NJNWEayyzuA@mail.gmail.com>
	<4E5C7B48.5080402@canterbury.ac.nz> <4E5CA35E.8000509@v.loewis.de>
	<j3i8li$net$1@dough.gmane.org> <4E5D148B.1060606@v.loewis.de>
	<CAP7+vJJJHnqXh5Fk5B37NYA_n3JhLEo6B-etK0TZ-cFe543sog@mail.gmail.com>
Message-ID: <CAGGBd_pWJYqLHekMZuBeV4uUNKSyOm1CAh0A47pDaQvygf2AGg@mail.gmail.com>

On Tue, Aug 30, 2011 at 10:05 AM, Guido van Rossum <guido at python.org> wrote:

> On Tue, Aug 30, 2011 at 9:49 AM, "Martin v. L?wis" <martin at v.loewis.de>
> wrote:
>  The problem lies with the PyPy backend -- there it generates ctypes
> code, which means that the signature you declare to Cython/Pyrex must
> match the *linker* level API, not the C compiler level API. Thus, if
> in a system header a certain function is really a macro that invokes
> another function with a permuted or augmented argument list, you'd
> have to know what that macro does. I also don't see how this would
> work for #defined constants: where does Cython/Pyrex get their value?
> ctypes doesn't have their values.
>
> So, for PyPy, a solution based on Cython/Pyrex has many of the same
> downsides as one based on ctypes where it comes to complying with an
> API defined by a .h file.
>

It's certainly a harder problem.

For most simple constants, Cython/Pyrex might be able to generate a series
of tiny C programs with which to find CPP symbol values:

#include "file1.h"
...
#include "filen.h"

main()
{
printf("%d", POSSIBLE_CPP_SYMBOL1);
}

...and again with %f, %s, etc.  The typing is quite a mess, and code
fragments would probably be impractical.  But since the C Preprocessor is
supposedly turing complete, maybe there's a pleasant surprise waiting there.

But hopefully clang has something that'd make this easier.

SIP's approach of using something close to, but not identical to, the .h's
sounds like it might be pretty productive - especially if the derivative of
the .h's could be automatically derived using a python script, with minor
tweaks to the inputs on .h upgrades.  But sip itself is apparently C++-only.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110901/170b2146/attachment.html>

From tjreedy at udel.edu  Thu Sep  1 20:05:20 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 01 Sep 2011 14:05:20 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
Message-ID: <j3ohie$508$1@dough.gmane.org>

On 9/1/2011 11:45 AM, Guido van Rossum wrote:

> typewriter). Dutch does have one native use of the umlaut (though it
> has a different name, I forget which, maybe trema :-),

You remember correctly. According to
https://secure.wikimedia.org/wikipedia/en/wiki/Trema_%28diacritic%29
'trema' (Greek 'hole') is the generic name of the double-dot vowel 
diacritic. It was originally used for 'diaerhesis' (Greek, 'taking 
apart') when it shows "that a vowel letter is not part of a digraph or 
diphthong". (Note that 'ae' in diaerhesis *is* a digraph ;-). Germans 
later used it to indicate umlaut, 'changed sound'.

> when there are
> two consecutive vowels that would normally be read as a special sound
> (diphthong?). E.g. in "koe" (cow) the oe is two letters (not a single
> letter formed of two distict shapes!) that mean a special sound
> (roughly KOO). But in a word like "co?xistentie" (coexistence) the o
> and e do not form the oe-sound, and to emphasize this to Dutch readers
> (who believe their spelling is very logical :-), the official spelling
> puts the umlaut on the e. This is definitely thought of as a separate
> mark added to the e; ? is not a new letter.

So the above is trema-diaerhesis. "Dutch, French, and Spanish make 
regular use of the diaeresis." English uses such as 'co?perate' have 
become rare or archaic, perhaps because we cannot type them. Too bad, 
since people sometimes use '-' to serve the same purpose.

-- 
Terry Jan Reedy


From stefan_ml at behnel.de  Thu Sep  1 20:11:33 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 01 Sep 2011 20:11:33 +0200
Subject: [Python-Dev] Cython, Ctypes and the stdlib
In-Reply-To: <CAGGBd_pWJYqLHekMZuBeV4uUNKSyOm1CAh0A47pDaQvygf2AGg@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>	<20110828002642.4765fc89@pitrou.net>	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>	<20110828012705.523e51d4@pitrou.net>	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>	<j3chv2$tvf$1@dough.gmane.org>
	<j3e137$q0g$1@dough.gmane.org>	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>	<4E5C01E4.2050106@canterbury.ac.nz>	<CAP7+vJJiXdSFCGhXJH1CVN3jwP=GqyUCqsvwSh6NJNWEayyzuA@mail.gmail.com>	<4E5C7B48.5080402@canterbury.ac.nz>
	<4E5CA35E.8000509@v.loewis.de>	<j3i8li$net$1@dough.gmane.org>
	<4E5D148B.1060606@v.loewis.de>	<CAP7+vJJJHnqXh5Fk5B37NYA_n3JhLEo6B-etK0TZ-cFe543sog@mail.gmail.com>
	<CAGGBd_pWJYqLHekMZuBeV4uUNKSyOm1CAh0A47pDaQvygf2AGg@mail.gmail.com>
Message-ID: <j3ohsl$7kt$1@dough.gmane.org>

Dan Stromberg, 01.09.2011 19:56:
> On Tue, Aug 30, 2011 at 10:05 AM, Guido van Rossum wrote:
>>   The problem lies with the PyPy backend -- there it generates ctypes
>> code, which means that the signature you declare to Cython/Pyrex must
>> match the *linker* level API, not the C compiler level API. Thus, if
>> in a system header a certain function is really a macro that invokes
>> another function with a permuted or augmented argument list, you'd
>> have to know what that macro does. I also don't see how this would
>> work for #defined constants: where does Cython/Pyrex get their value?
>> ctypes doesn't have their values.
>>
>> So, for PyPy, a solution based on Cython/Pyrex has many of the same
>> downsides as one based on ctypes where it comes to complying with an
>> API defined by a .h file.
>
> It's certainly a harder problem.
>
> For most simple constants, Cython/Pyrex might be able to generate a series
> of tiny C programs with which to find CPP symbol values:
>
> #include "file1.h"
> ...
> #include "filen.h"
>
> main()
> {
> printf("%d", POSSIBLE_CPP_SYMBOL1);
> }
>
> ...and again with %f, %s, etc.The typing is quite a mess

The user will commonly declare #defined values as typed external variables 
and callable macros as functions in .pxd files. These manually typed 
"macro" functions allow users to tell Cython what it should know about how 
the macros will be used. And that would allow it to generate C/C++ glue 
code for them that uses the declared types as a real function signature and 
calls the macro underneath.


> and code fragments would probably be impractical.

Not necessarily at the C level but certainly for a ctypes backend, yes.


> But hopefully clang has something that'd make this easier.

For figuring these things out, maybe. Not so much for solving the problems 
they introduce.

Stefan


From stephen at xemacs.org  Thu Sep  1 20:28:06 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 02 Sep 2011 03:28:06 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E5F568A.4020301@g.nevcal.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5DEC35.4010404@g.nevcal.com>
	<CAP7+vJJtYZ8vspUimoMh1j6ye6SbfT-eT-YPMG3htU+2NPSNXA@mail.gmail.com>
	<4E5E882C.1050006@g.nevcal.com>
	<87pqjkk814.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E5F568A.4020301@g.nevcal.com>
Message-ID: <87fwkgjex5.fsf@uwakimon.sk.tsukuba.ac.jp>

Glenn Linderman writes:

 > Windows 7 64-bit on one of my computers happily crashes several
 > times a day when it detects inconsistent internal state... under
 > the theory, I guess, that losing work is better than saving bad
 > work.  You sound the opposite.

Definitely.  Windows apps habitually overwrite existing work; saving
when inconsistent would be a bad idea.  The apps I work on dump their
unsaved buffers to new files, and give you a chance to look at them
before instating them as the current version when you restart.

 > Except, I'm not sure how PEP 393 space optimization fits with the other 
 > operations.  It may even be that an application-wide complex-grapheme 
 > cache would save significant space, although if it uses high-bits in a 
 > string representation to reference the cache, PEP 393 would jump 
 > immediately to something > 16 bits per grapheme... but likely would 
 > anyway, if complex-graphemes are in the data stream.

The only language I know of that uses thousands of complex graphemes
is Korean ... and the precomposed forms are already in the BMP.  I
don't know how many accented forms you're likely to see in Vietnamese,
but I suspect it's less than 6400 (the number of characters in private
space in the BMP).  So for most applications, I believe that mapping
both non-BMP code points and grapheme clusters into that private space
should be feasible.  The only potential counterexample I can think of
is display of Arabic, which I have heard has thousands of glyphs in
good fonts because of the various ways ligatures form in that script.
However AFAIK no apps encode these as characters; I'm just admitting
that it *might* be useful.

This will require some care in registering such characters and
clusters because input text may already use private space according to
some convention, which would need to be respected.  Still, 6400
characters is a lot, even for the Japanese (IIRC the combined
repertoire of "corporate characters" that for some reason never made
it into the JIS sets is about 600, but almost all of them are already
in the BMP).  I believe the total number of Japanese emoticons is
about 200, but I doubt that any given text is likely to use more than
a few.  So I think there's plenty of space there.

This has a few advantages: (1) since these are real characters, all
Unicode algorithms will apply as long as the appropriate properties
are applied to the character in the database, and (2) it works with a
narrow code unit (specifically, UCS-2, but it could also be used with
UTF-8).  If you really need more than 6400 grapheme clusters, promote
to UTF-32, and get two more whole planes full (about 130,000 code
points).

 > I didn't attribute any efficiency to flagging lone surrogates (BI-5).  
 > Since Windows uses a non-validated UCS-2 or UTF-16 character type, any 
 > Python program that obtains data from Windows APIs may be confronted 
 > with lone surrogates or inappropriate combining characters at any
 > time.

I don't think so.  AFAIK all that data must pass through a codec,
which will validate it unless you specifically tell it not to.

 > Round-tripping that data seems useful,

The standard doesn't forbid that.  (ISTR it did so in the past, but
what is required in 6.0 is a specific algorithm for identifying
well-formed portions of the text, basically "if you're currently in an
invalid region, read individual code units and attempt to assemble a
valid sequence -- as soon as you do, that is a valid code point, and
you switch into valid state and return to the normal algorithm".)

Specifically, since surrogates are not characters, leaving them in the
data does not constitute "interpreting them as characters."  I don't
recall if any of the error handlers allow this, though.

 > However, returning modified forms of it to Windows as UCS-2 or
 > UTF-16 data may still cause other applications to later
 > accidentally combine the characters, if the modifications
 > juxtaposed things to make them look reasonably, even if
 > accidentally.

In CPython AFAIK (I don't do Windows) this can only happen if you use
a non-default error setting in the output codec.

 > After writing all those ideas down, I actually preferred some of
 > the others, that achieved O(1) real grapheme indexing, rather than
 > caching character properties.

If you need O(1) grapheme indexing, use of private space seems a
winner to me.  It's just defining private precombined characters, and
they won't bother any Unicode application, even if they leak out.

 > > What are the costs to applications that don't want the cache?
 > > How is the bit-cache affected by PEP 393?
 > 
 > If it is a separate type from str, then it costs nothing except the
 > extra code space to implement the cache for those applications that
 > do want it... most of which wouldn't be loaded for applications
 > that don't, if done as a module or C extension.

I'm talking about the bit-cache (which all of your BI-N referred to,
at least indirectly).  Many applications will want to work with fully
composed characters, whether they're represented in a single code
point or not.  But they may not care about any of the bit-cache ideas.

 > OK... ignore the bit-cache idea (BI-1), and reread the others without 
 > having your mind clogged with that one, and see if any of them make 
 > sense to you then.  But you may be too biased by the "minor" needs of 
 > keeping the internal representation similar to the stream representation 
 > to see any value in them.

No, I'm biased by the fact that I already good ways to do them without
leaving the set of representations provided by Unicode (often ways
which provide additional advantages), and by the fact that I myself
don't know any use cases for the bit-cache yet.

 > I rather like BI-2, since it allow O(1) indexing of graphemes.

I do too (without suggesting a non-standard representation, ie, using
private space), but I'm sure that wheel has been reinvented quite
frequently.  It's a very common trick in text processing, although I
don't know of other applications where it's specifically used to turn
data that "fails to be an array just a little bit" into a true array
(although I suppose you could view fixed-width EUC encodings that
way).


From stephen at xemacs.org  Thu Sep  1 20:54:56 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 02 Sep 2011 03:54:56 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
Message-ID: <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp>

Guido van Rossum writes:
 > On Thu, Sep 1, 2011 at 12:13 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:

 > > while at least this Spanish-as-a-second-language learner was taught
 > > that `?' is an atomic character represented by a discontiguous glyph,
 > > like `i', and it is no more related to `n' than `m' is. ?Users really
 > > believe that characters are atomic. ?Even in the cases of Han
 > > characters and Hangul, users think of the characters as being
 > > "atomic," but in the sense of Bohr rather than that of Democritus.
 > 
 > Ah, I think this may very well be culture-dependent.

I'm not an expert, but I'm fairly sure it is.  Specifically, I heard
from a TeX-ie friend that the same accented letter is typeset (and
collated) differently in different European languages because in some
of them the accent is considered part of the letter (making a
different character), while in others accents modify a single
underlying character.  The ones that consider the letter and accent to
constitute a single character also prefer to leave less space, he
said.

 > But in a word like "co?xistentie" (coexistence) the o and e do not
 > form the oe-sound, and to emphasize this to Dutch readers (who
 > believe their spelling is very logical :-), the official spelling
 > puts the umlaut on the e.

American English has the same usage, but it's optional (in particular,
you'll see naive, naif, and words like coordinate typeset that way
occasionally, for the same reason I suppose).

As Hagen F?rstenau points out, with multiple combining characters,
there are even more complex possibilities than "the accent is part of
the character" and "it's really not", and they may be application-
dependent.

 > Finally, my guess is that the Spanish emphasis on ? as a separate
 > letter has to do with teaching how it has a separate position in the
 > localized collation sequence, doesn't it?

You'd have to ask Mr. Gonzalez.  I suspect he may have taught that way
less because of his Castellano upbringing, and more because of the
infamous lack of sympathy of American high school students for the
fine points of usage in foreign languages.

 > I'm also curious if ? occurs as a separate character on Spanish
 > keyboards.

If I'm reading /usr/share/X11/xkb/symbols/es correctly, it does in
X.org:  the key that for English users would map to ASCII tilde.


From solipsis at pitrou.net  Thu Sep  1 20:54:45 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 01 Sep 2011 20:54:45 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E537EEC.1070602@v.loewis.de>
	<1314099542.3485.10.camel@localhost.localdomain>
	<4E53945E.1050102@v.loewis.de>
	<1314101745.3485.18.camel@localhost.localdomain>
	<4E53A5D1.2040808@v.loewis.de> <4E53A950.30005@haypocalc.com>
	<j31hlc$dp5$2@dough.gmane.org>
	<87r54bb4mq.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j32igg$hd7$1@dough.gmane.org>
	<CAP7+vJJ=BLoWjoMQQidKPshMgiSPhYv_=zZ+QYn5P8E0--B08w@mail.gmail.com>
	<CACBhJdGN_DJW+5b792WqMxFerdQ+DqFxNi-TtkiYs3QCA6hprA@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <1314903285.3617.31.camel@localhost.localdomain>


>  > Finally, my guess is that the Spanish emphasis on ? as a separate
>  > letter has to do with teaching how it has a separate position in the
>  > localized collation sequence, doesn't it?
> 
> You'd have to ask Mr. Gonzalez.  I suspect he may have taught that way
> less because of his Castellano upbringing, and more because of the
> infamous lack of sympathy of American high school students for the
> fine points of usage in foreign languages.

If you look at Wikipedia, it says:
?El alfabeto espa?ol consta de 27 letras?. The ? is separate from the N
(and so is it in my French-Spanish dictionnary). The accented letters,
however, are not considered separately.
http://es.wikipedia.org/wiki/Alfabeto_espa%C3%B1ol

(I can't tell you how annoying to type "?" is when the tilde is accessed
using AltGr + 2 and you have to combine that with the Compose key and N
to obtain the full character. I'm sure Spanish keyboards have a better
way than that :-))

Regards

Antoine.


From tseaver at palladion.com  Thu Sep  1 21:13:08 2011
From: tseaver at palladion.com (Tres Seaver)
Date: Thu, 01 Sep 2011 15:13:08 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <1314903285.3617.31.camel@localhost.localdomain>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314903285.3617.31.camel@localhost.localdomain>
Message-ID: <j3olg4$34s$1@dough.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/01/2011 02:54 PM, Antoine Pitrou wrote:
> 
> If you look at Wikipedia, it says: ?El alfabeto espa?ol consta de 27 
> letras?. The ? is separate from the N (and so is it in my 
> French-Spanish dictionnary). The accented letters, however, are not 
> considered separately. 
> http://es.wikipedia.org/wiki/Alfabeto_espa%C3%B1ol
> 
> (I can't tell you how annoying to type "?" is when the tilde is 
> accessed using AltGr + 2 and you have to combine that with the 
> Compose key and N to obtain the full character. I'm sure Spanish 
> keyboards have a better way than that :-))

FWIW, I was taught that Spanish had 30 letters in the alfabeto:  the
'?', plus 'ch', 'll', and 'rr' were all considered distinct characters.

Kids-these-days'ly,


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5f2UQACgkQ+gerLs4ltQ4URACePSZzpoPAg2IIYZewsjbuplkK
0MgAoM7VfdQHzjBiU6Vr/MYPJ9U2qC3M
=pvKn
-----END PGP SIGNATURE-----


From ethan at stoneleaf.us  Thu Sep  1 21:38:07 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 01 Sep 2011 12:38:07 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j3olg4$34s$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>	<4E576793.2010203@v.loewis.de>
	<4E5824E1.9010101@udel.edu>	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>	<4E5869C2.2040008@udel.edu>	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>	<20110829141440.2e2178c6@pitrou.net>	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>	<1314724786.3554.1.camel@localhost.localdomain>	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>	<87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp>	<1314903285.3617.31.camel@localhost.localdomain>
	<j3olg4$34s$1@dough.gmane.org>
Message-ID: <4E5FDF1F.9010308@stoneleaf.us>

Tres Seaver wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 09/01/2011 02:54 PM, Antoine Pitrou wrote:
>> If you look at Wikipedia, it says: ?El alfabeto espa?ol consta de 27 
>> letras?. The ? is separate from the N (and so is it in my 
>> French-Spanish dictionnary). The accented letters, however, are not 
>> considered separately. 
>> http://es.wikipedia.org/wiki/Alfabeto_espa%C3%B1ol
>>
>> (I can't tell you how annoying to type "?" is when the tilde is 
>> accessed using AltGr + 2 and you have to combine that with the 
>> Compose key and N to obtain the full character. I'm sure Spanish 
>> keyboards have a better way than that :-))
> 
> FWIW, I was taught that Spanish had 30 letters in the alfabeto:  the
> '?', plus 'ch', 'll', and 'rr' were all considered distinct characters.
> 
> Kids-these-days'ly,

Not sure what's going on, but according to the article Antoine linked to 
those aren't letters anymore...  so much for the cultural awareness 
portion of UNESCO.

~Ethan~

From solipsis at pitrou.net  Thu Sep  1 21:34:56 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 1 Sep 2011 21:34:56 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314903285.3617.31.camel@localhost.localdomain>
	<j3olg4$34s$1@dough.gmane.org> <4E5FDF1F.9010308@stoneleaf.us>
Message-ID: <20110901213456.4d38a240@pitrou.net>

On Thu, 01 Sep 2011 12:38:07 -0700
Ethan Furman <ethan at stoneleaf.us> wrote:
> > 
> > FWIW, I was taught that Spanish had 30 letters in the alfabeto:  the
> > '?', plus 'ch', 'll', and 'rr' were all considered distinct characters.
> > 
> > Kids-these-days'ly,
> 
> Not sure what's going on, but according to the article Antoine linked to 
> those aren't letters anymore...  so much for the cultural awareness 
> portion of UNESCO.

That Wikipedia article also says:

?Los d?grafos Ch y Ll tienen valores fon?ticos espec?ficos, y durante
los siglos XIX y XX se ordenaron separadamente de C y L, aunque la
pr?ctica se abandon? en 1994 para homogeneizar el sistema con otras
lenguas.?

-> roughly: ?the "Ch" and "Ll" digraphs have specific phonetic values,
and during the 19th and 20th centuries they were ordered separately
from C and L, but this practice was abandoned in 1994 in order to
make the system consistent with other languages.?

And about "rr":

?El d?grafo rr (llamado erre, /'ere/, y pronunciado /r/) nunca se
consider? por separado, probablemente por no aparecer nunca en posici?n
inicial.?

-> ?the "rr" digraph was never considered separate, probably because it
never appears at the very beginning of a word.?

Regards

Antoine.


From greg.ewing at canterbury.ac.nz  Fri Sep  2 02:30:12 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 02 Sep 2011 12:30:12 +1200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJK9vZE2+Uyhc_baUXfrCCvWEu7+h5J1GJTsJm6KEkuhtw@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<1314893027.3617.12.camel@localhost.localdomain>
	<CAP7+vJK9vZE2+Uyhc_baUXfrCCvWEu7+h5J1GJTsJm6KEkuhtw@mail.gmail.com>
Message-ID: <4E602394.70707@canterbury.ac.nz>

Guido van Rossum wrote:

> I recall long ago that when the french wrote words in all caps they
> would drop the accents, e.g. ECOLE. I even recall (through the mists
> of time) observing this in Paris on public signs. Is this still the
> convention?

This page features a number of French street signs
in all-caps, and some of them have accents:

http://www.happymall.com/france/paris_street_signs.htm

-- 
Greg


From greg.ewing at canterbury.ac.nz  Fri Sep  2 02:36:20 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 02 Sep 2011 12:36:20 +1200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
Message-ID: <4E602504.9060505@canterbury.ac.nz>

Guido van Rossum wrote:
> But in a word like "co?xistentie" (coexistence) the o
> and e do not form the oe-sound, and to emphasize this to Dutch readers
> (who believe their spelling is very logical :-), the official spelling
> puts the umlaut on the e.

Sometimes this is done in English too -- occasionally
you see words like "cooperation" spelled with a diaresis
over the second "o". But these days it's more common to
use a hyphen, or not bother at all. Everyone knows how
it's pronounced.

-- 
Greg

From solipsis at pitrou.net  Fri Sep  2 02:42:46 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 2 Sep 2011 02:42:46 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<1314893027.3617.12.camel@localhost.localdomain>
	<CAP7+vJK9vZE2+Uyhc_baUXfrCCvWEu7+h5J1GJTsJm6KEkuhtw@mail.gmail.com>
	<4E602394.70707@canterbury.ac.nz>
Message-ID: <20110902024246.58217e77@pitrou.net>

On Fri, 02 Sep 2011 12:30:12 +1200
Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
> 
> > I recall long ago that when the french wrote words in all caps they
> > would drop the accents, e.g. ECOLE. I even recall (through the mists
> > of time) observing this in Paris on public signs. Is this still the
> > convention?
> 
> This page features a number of French street signs
> in all-caps, and some of them have accents:
> 
> http://www.happymall.com/france/paris_street_signs.htm

I don't think some American souvenir shop is a good reference, though :)
(for example, there's no Paris street named "ch?teau de Versailles")

Regards

Antoine.


From greg.ewing at canterbury.ac.nz  Fri Sep  2 02:52:31 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 02 Sep 2011 12:52:31 +1200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j3ohie$508$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<j3ohie$508$1@dough.gmane.org>
Message-ID: <4E6028CF.9030801@canterbury.ac.nz>

Terry Reedy wrote:
> Too bad, since people sometimes use '-' to serve the same purpose.

Which actually seems more logical to me -- a separating
symbol is better placed between the things being separated,
rather than over the top of one of them!

Maybe we could compromise by turning the diaeresis on
its side:

   co:operate

-- 
Greg


From steve at pearwood.info  Fri Sep  2 03:30:44 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 02 Sep 2011 11:30:44 +1000
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <1314893027.3617.12.camel@localhost.localdomain>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>	<4E576793.2010203@v.loewis.de>
	<4E5824E1.9010101@udel.edu>	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>	<4E5869C2.2040008@udel.edu>	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>	<20110829141440.2e2178c6@pitrou.net>	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>	<1314724786.3554.1.camel@localhost.localdomain>	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<1314893027.3617.12.camel@localhost.localdomain>
Message-ID: <4E6031C4.7030809@pearwood.info>

Antoine Pitrou wrote:
> Le jeudi 01 septembre 2011 ? 08:45 -0700, Guido van Rossum a ?crit :
>> This is definitely thought of as a separate
>> mark added to the e; ? is not a new letter. I have a feeling it's the
>> same way for the French and Germans, but I really don't know.
>> (Antoine? Georg?)
> 
> Indeed, they are not separate "letters" (they are considered the same in
> lexicographic order, and the French alphabet has 26 letters).


On the other hand, the same doesn't necessarily apply to other 
languages. (At least according to Wikipedia.)

http://en.wikipedia.org/wiki/Diacritic#Languages_with_letters_containing_diacritics


-- 
Steven


From stephen at xemacs.org  Fri Sep  2 05:59:01 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 02 Sep 2011 12:59:01 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j3olg4$34s$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314903285.3617.31.camel@localhost.localdomain>
	<j3olg4$34s$1@dough.gmane.org>
Message-ID: <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp>

Tres Seaver writes:

 > FWIW, I was taught that Spanish had 30 letters in the alfabeto:  the
 > '?', plus 'ch', 'll', and 'rr' were all considered distinct characters.

That was always a Castellano vs. Americano issue, IIRC.  As I wrote,
Mr. Gonzalez was Castellano.

I believe that the deprecation of the digraphs as separate letters
occurred as the telephone became widely used in Spain, and the
telephone company demanded an official proclamation from whatever
Ministry is responsible for culture that it was OK to treat the
digraphs as two letters (specifically, to collate them that way), so
that they could use the programs that came with the OS.

So this stuff is not merely variant by culture, but also by economics
and politics. :-/

From s.brunthaler at uci.edu  Fri Sep  2 06:37:28 2011
From: s.brunthaler at uci.edu (stefan brunthaler)
Date: Thu, 1 Sep 2011 21:37:28 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAP7+vJ++iZUiQvzyR28aH9ugwPQPcnOcUC8rih4ebG9R8B0pmA@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>
	<112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com>
	<CAP7+vJ++iZUiQvzyR28aH9ugwPQPcnOcUC8rih4ebG9R8B0pmA@mail.gmail.com>
Message-ID: <CA+j1x0m4d9Oi2at6n7_Z8eZsSLfKEHiHfvFjSNFpXfJH5KOuGA@mail.gmail.com>

Hi,

as promised, I created a publicly available preview of an
implementation with my optimizations, which is available under the
following location:
https://bitbucket.org/py3_pio/preview/wiki/Home

I followed Nick's advice and added some valuable advice and
overview/introduction at the wiki page the link points to, I am
positive that spending 10mins reading this will provide you with a
valuable information regarding what's happening.
In addition, as Guido already mentioned, this is more or less a direct
copy of my research-branch without some of my private comments and
*no* additional refactorings because of software-engineering issues
(which I am very much aware of.)

I hope this clarifies a *lot* and makes it easier to see what parts
are involved and how all the pieces fit together.

I hope you'll like it,
have fun,
--stefan

From greg.ewing at canterbury.ac.nz  Fri Sep  2 07:45:04 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 02 Sep 2011 17:45:04 +1200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <20110902024246.58217e77@pitrou.net>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<1314893027.3617.12.camel@localhost.localdomain>
	<CAP7+vJK9vZE2+Uyhc_baUXfrCCvWEu7+h5J1GJTsJm6KEkuhtw@mail.gmail.com>
	<4E602394.70707@canterbury.ac.nz> <20110902024246.58217e77@pitrou.net>
Message-ID: <4E606D60.2040605@canterbury.ac.nz>

Antoine Pitrou wrote:

> I don't think some American souvenir shop is a good reference, though :)
> (for example, there's no Paris street named "ch?teau de Versailles")

Hmmm, I'd assumed they were reproductions of actual
street signs found in Paris, but maybe not. :-(

-- 
Greg

From ncoghlan at gmail.com  Fri Sep  2 07:55:01 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 2 Sep 2011 15:55:01 +1000
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0m4d9Oi2at6n7_Z8eZsSLfKEHiHfvFjSNFpXfJH5KOuGA@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>
	<112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com>
	<CAP7+vJ++iZUiQvzyR28aH9ugwPQPcnOcUC8rih4ebG9R8B0pmA@mail.gmail.com>
	<CA+j1x0m4d9Oi2at6n7_Z8eZsSLfKEHiHfvFjSNFpXfJH5KOuGA@mail.gmail.com>
Message-ID: <CADiSq7e4ajkxgg2Z4kq7EDfRN367PPw1RNcQaWBPv+n3ye+08Q@mail.gmail.com>

On Fri, Sep 2, 2011 at 2:37 PM, stefan brunthaler <s.brunthaler at uci.edu> wrote:
> I hope this clarifies a *lot* and makes it easier to see what parts
> are involved and how all the pieces fit together.

It does, thanks. There are likely to be some fun corner cases relating
to trace functions and use of the "locals()" builtin, but now the code
has been published hopefully those interested will be able to dig in
and provide some more detailed feedback.

(Not me, though - I've already dropped some things from my original
personal to-do list for 3.3, so I'm not keen to start adding any
more).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From stefan_ml at behnel.de  Fri Sep  2 08:01:09 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 02 Sep 2011 08:01:09 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E602504.9060505@canterbury.ac.nz>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>	<4E576793.2010203@v.loewis.de>
	<4E5824E1.9010101@udel.edu>	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>	<4E5869C2.2040008@udel.edu>	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>	<20110829141440.2e2178c6@pitrou.net>	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>	<1314724786.3554.1.camel@localhost.localdomain>	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<4E602504.9060505@canterbury.ac.nz>
Message-ID: <j3prf5$it3$1@dough.gmane.org>

Greg Ewing, 02.09.2011 02:36:
> Guido van Rossum wrote:
>> But in a word like "co?xistentie" (coexistence) the o
>> and e do not form the oe-sound, and to emphasize this to Dutch readers
>> (who believe their spelling is very logical :-), the official spelling
>> puts the umlaut on the e.
>
> Sometimes this is done in English too -- occasionally
> you see words like "cooperation" spelled with a diaresis
> over the second "o". But these days it's more common to
> use a hyphen, or not bother at all. Everyone knows how
> it's pronounced.

Right. There are so many words in the English language that you can't 
pronounce without knowing them, that the few words that fall into the above 
category really don't matter.

Stefan


From tjreedy at udel.edu  Fri Sep  2 08:34:01 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 02 Sep 2011 02:34:01 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314903285.3617.31.camel@localhost.localdomain>
	<j3olg4$34s$1@dough.gmane.org>
	<87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <j3pte7$v5h$1@dough.gmane.org>

On 9/1/2011 11:59 PM, Stephen J. Turnbull wrote:
>
> I believe that the deprecation of the digraphs as separate letters
> occurred as the telephone became widely used in Spain, and the
> telephone company demanded an official proclamation from whatever
> Ministry is responsible for culture that it was OK to treat the
> digraphs as two letters (specifically, to collate them that way), so
> that they could use the programs that came with the OS.

The main 'standards body' for Spanish is the Real Academia Espa?ola in 
Madrid, which works with the 21 other members of the Asociaci?n de 
Academias de la Lengua Espa?ola.
wikimedia.org/wikipedia/en/wiki/Real_Academia_Espa?ola
.wikimedia.org/wikipedia/en/wiki/Association_of_Spanish_Language_Academies


While it has apparently been criticized as 'conservative' (which is well 
ought to be), it has been rather progressive in promoting changes such 
as 'ph' to 'f' (fisica, fone) and dropping silent 'p' in leading 'psi' 
(sicologia) and silent 's' in leading 'sci' (ciencia).

-- 
Terry Jan Reedy


From jeremy at jeremysanders.net  Fri Sep  2 10:55:32 2011
From: jeremy at jeremysanders.net (Jeremy Sanders)
Date: Fri, 02 Sep 2011 09:55:32 +0100
Subject: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression
	support in 3.3)
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<4E5951D5.5020200@v.loewis.de>
	<CAGGBd_ppHSb3D0DSHiJzZDWYXLiv4kZJ91VubiR59E1Yj5bAGA@mail.gmail.com>
	<CANF4RMn1OiL3G9Otnrdf7TVeS_OsggP6kJM4DTM=J_V+JLEWnQ@mail.gmail.com>
	<CAGGBd_r=W85uCcruRqkE+YC7iSMY7d=hv8K9y=XCkxGaLVQL-Q@mail.gmail.com>
	<20110828002642.4765fc89@pitrou.net>
	<CAGGBd_q9qwreujr-_4ZUXyn0N8GMNWuwFixfNR_AP9ND-Hf+ZA@mail.gmail.com>
	<20110828012705.523e51d4@pitrou.net>
	<CAGGBd_rvwQRFxD9-0NCAzxA7ThansjOmAFjTOS613mgLWhxwLg@mail.gmail.com>
	<j3chv2$tvf$1@dough.gmane.org> <j3e137$q0g$1@dough.gmane.org>
	<CAP7+vJLT8sRuWhcbiCUBd0SXBpEzF4jCBVbQ5+yKQbDEUV68oQ@mail.gmail.com>
	<4E5C01E4.2050106@canterbury.ac.nz>
	<CAP7+vJJiXdSFCGhXJH1CVN3jwP=GqyUCqsvwSh6NJNWEayyzuA@mail.gmail.com>
	<4E5C7B48.5080402@canterbury.ac.nz>
	<4E5CA35E.8000509@v.loewis.de> <j3i8li$net$1@dough.gmane.org>
	<4E5D148B.1060606@v.loewis.de>
	<CAP7+vJJJHnqXh5Fk5B37NYA_n3JhLEo6B-etK0TZ-cFe543sog@mail.gmail.com>
	<CAGGBd_pWJYqLHekMZuBeV4uUNKSyOm1CAh0A47pDaQvygf2AGg@mail.gmail.com>
Message-ID: <j3q5m4$grm$1@dough.gmane.org>

Dan Stromberg wrote:

> SIP's approach of using something close to, but not identical to, the .h's
> sounds like it might be pretty productive - especially if the derivative
> of the .h's could be automatically derived using a python script, with
> minor
> tweaks to the inputs on .h upgrades.  But sip itself is apparently
> C++-only.

http://www.riverbankcomputing.co.uk/software/sip/intro

"What is SIP?

One of the features of Python that makes it so powerful is the ability to 
take existing libraries, written in C or C++, and make them available as 
Python extension modules. Such extension modules are often called bindings 
for the library.

SIP is a tool that makes it very easy to create Python bindings for C and 
C++ libraries. It was originally developed to create PyQt, the Python 
bindings for the Qt toolkit, but can be used to create bindings for any C or 
C++ library. "


It's not C++ only. The code for SIP is also in C.

Jeremy


From stefan_ml at behnel.de  Fri Sep  2 11:13:19 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 02 Sep 2011 11:13:19 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0m4d9Oi2at6n7_Z8eZsSLfKEHiHfvFjSNFpXfJH5KOuGA@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>	<4E5CA1F0.2070005@v.loewis.de>	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>	<20110830193806.0d718a56@pitrou.net>	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>	<CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>	<112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com>	<CAP7+vJ++iZUiQvzyR28aH9ugwPQPcnOcUC8rih4ebG9R8B0pmA@mail.gmail.com>
	<CA+j1x0m4d9Oi2at6n7_Z8eZsSLfKEHiHfvFjSNFpXfJH5KOuGA@mail.gmail.com>
Message-ID: <j3q6ng$qq2$1@dough.gmane.org>

stefan brunthaler, 02.09.2011 06:37:
> as promised, I created a publicly available preview of an
> implementation with my optimizations, which is available under the
> following location:
> https://bitbucket.org/py3_pio/preview/wiki/Home
>
> I followed Nick's advice and added some valuable advice and
> overview/introduction at the wiki page the link points to, I am
> positive that spending 10mins reading this will provide you with a
> valuable information regarding what's happening.

It does, thanks.

A couple of remarks:

1) The SFC optimisation is purely based on static code analysis, right? I 
assume it takes loops into account (and just multiplies scores for inner 
loops)? Is that what you mean with "nesting level"? Obviously, static 
analysis can sometimes be misleading, e.g. when there's a rare special case 
with lots of loops that needs to adapt input data in some way, but in 
general, I'd expect that this heuristic would tend to hit the important 
cases, especially for well structured code with short functions.

2) The RC elimination is tricky to get right and thus somewhat dangerous, 
but sounds worthwhile and should work particularly well on a stack based 
byte code interpreter like CPython.

3) Inline caching also sounds worthwhile, although I wonder how large the 
savings will be here. You'd save a couple of indirect jumps at the C-API 
level, sure, but apart from that, my guess is that it would highly depend 
on the type of instruction. Certain (repeated) calls to C implemented 
functions would likely benefit quite a bit, for example, which would be a 
nice optimisation by itself, e.g. for builtins. I would expect that the 
same applies to iterators, even a couple of percent faster iteration can 
make a great deal of a difference, and a substantial set of iterators are 
implemented in C, e.g. itertools, range, zip and friends.

I'm not so sure about arithmetic operations. In Cython, we (currently?) do 
not optimistically replace these with more specific code (unless we know 
the types at compile time), because it complicates the generated C code and 
indirect jumps aren't all that slow that the benefit would be important. 
Savings are *much* higher when data can be unboxed, so much that the slight 
improvement for optimistic type guesses is totally dwarfed in Cython. I 
would expect that the return of investment is better when the types are 
actually known at runtime, as in your case.

4) Regarding inlined object references, I would expect that it's much more 
worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST. I guess 
that this would be best helped by watching the module dict and the builtin 
dict internally and invalidating the interpreter state after changes (e.g. 
by providing a change counter in those dicts and checking that in the 
instructions that access them), and otherwise keeping the objects cached. 
Simply watching the dedicated instructions that change that state isn't 
enough as Python allows code to change these dicts directly through their 
dict interface.

All in all, your list does sound like an interesting set of changes that 
are both understandable and worthwhile.

Stefan


From s.brunthaler at uci.edu  Fri Sep  2 17:20:28 2011
From: s.brunthaler at uci.edu (stefan brunthaler)
Date: Fri, 2 Sep 2011 08:20:28 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0m4d9Oi2at6n7_Z8eZsSLfKEHiHfvFjSNFpXfJH5KOuGA@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<CAPZV6o9wMj6GuK57ZGrmAj2oEJ+KQXFTELTfd+97uB2GU_JVRg@mail.gmail.com>
	<CA+j1x0nZSTdg_d1K9rov_ikRK05qewSA1y6dzp2wUdgF7wQekg@mail.gmail.com>
	<20110829231420.20c3516a@pitrou.net>
	<CADiSq7cePgwhZgvF0hvJTEofR5By5-0H-O2HD-W1xEGObHeoOA@mail.gmail.com>
	<20110830025510.638b41d9@pitrou.net> <4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>
	<112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com>
	<CAP7+vJ++iZUiQvzyR28aH9ugwPQPcnOcUC8rih4ebG9R8B0pmA@mail.gmail.com>
	<CA+j1x0m4d9Oi2at6n7_Z8eZsSLfKEHiHfvFjSNFpXfJH5KOuGA@mail.gmail.com>
Message-ID: <CA+j1x0m1SHJBi44Qbt0gRbRB_tyz886+YLTy=bnGcN13KsGp8A@mail.gmail.com>

> as promised, I created a publicly available preview of an
> implementation with my optimizations, which is available under the
> following location:
> https://bitbucket.org/py3_pio/preview/wiki/Home
>
One very important thing that I forgot was to indicate that you have
to use computed gotos (i.e., "configure --with-computed-gotos"),
otherwise it won't work (though I think that most people can figure
this out easily, knowing this a priori isn't too bad.)

Regards,
--stefan

From s.brunthaler at uci.edu  Fri Sep  2 17:55:03 2011
From: s.brunthaler at uci.edu (stefan brunthaler)
Date: Fri, 2 Sep 2011 08:55:03 -0700
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <j3q6ng$qq2$1@dough.gmane.org>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<4E5CA1F0.2070005@v.loewis.de>
	<CA+j1x0neePU6J_YycTKnd0crzKjdy-mPJ7=pxGerphiVj5z9dw@mail.gmail.com>
	<20110830193806.0d718a56@pitrou.net>
	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>
	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>
	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>
	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>
	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>
	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>
	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>
	<CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>
	<112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com>
	<CAP7+vJ++iZUiQvzyR28aH9ugwPQPcnOcUC8rih4ebG9R8B0pmA@mail.gmail.com>
	<CA+j1x0m4d9Oi2at6n7_Z8eZsSLfKEHiHfvFjSNFpXfJH5KOuGA@mail.gmail.com>
	<j3q6ng$qq2$1@dough.gmane.org>
Message-ID: <CA+j1x0kN0nx_buHVMayoUxCLaue-9CVeN67y3qmsS=vt1rQjDA@mail.gmail.com>

> 1) The SFC optimisation is purely based on static code analysis, right? I
> assume it takes loops into account (and just multiplies scores for inner
> loops)? Is that what you mean with "nesting level"? Obviously, static
> analysis can sometimes be misleading, e.g. when there's a rare special case
> with lots of loops that needs to adapt input data in some way, but in
> general, I'd expect that this heuristic would tend to hit the important
> cases, especially for well structured code with short functions.
>
Yes, currently I only use the heuristic to statically estimate utility
of assigning an optimized slot to a local variable. And, another yes,
nested blocks (like for-statements) is what I have in mind when using
"nesting level". I was told that the algorithm itself is very similar
to linear scan register allocation, modulo the ability to spill
values, of course.
>From my benchmarks and in-depth analysis of several programs, I found
this to work very well. In fact, the only situation I found is
(unfortunately) one of the top-most executed functions in US'
bm_django.py: There is one loop that gets almost never executed but
this loop gives precedence to local variables used inside. Because of
this, I have already an idea for a better approach: first, use the
static heuristic to compute stack slot score, then count back-branches
(I would need this anyways, as the _Py_CheckInterval has gone and
OSR/hot-swapping is in general a good idea) and record their
frequency. Next, just replace the current static weight of 100 by the
dynamically recorded weight. Consequently, you should get better
allocations. (Please note that I did some quantitative analysis of
bython functions to determine that using 4 SFC-slots covers a
substantial amount of functions [IIRC >95%] with the trivial scenario
when there are at most 4 local variables.)


> 2) The RC elimination is tricky to get right and thus somewhat dangerous,
> but sounds worthwhile and should work particularly well on a stack based
> byte code interpreter like CPython.
>
Well, it was very tricky to get right when I implemented it first
around Christmas 2009. The current implementation is reasonably simple
to understand, however, it depends on the function refcount_effect to
give me correct information at all times. I got the biggest
performance improvement on one benchmark on the PowerPC and think that
RISC architectures in general benefit more from this optimization
(eliminate the load, add and store instructions) than x86 CISCs do (an
INCREF is just an add on the memory location without data
dependencies, so fairly cheap). In any case, however, you get the
replication effect of improving CPU branch predicion by having these
additional instruction derivatives. It would be interesting
(research-wise, too) to be able to measure whether the reduction in
memory operations makes Python programs use less energy, and if so,
how much the difference is.


> 3) Inline caching also sounds worthwhile, although I wonder how large the
> savings will be here. You'd save a couple of indirect jumps at the C-API
> level, sure, but apart from that, my guess is that it would highly depend on
> the type of instruction. Certain (repeated) calls to C implemented functions
> would likely benefit quite a bit, for example, which would be a nice
> optimisation by itself, e.g. for builtins. I would expect that the same
> applies to iterators, even a couple of percent faster iteration can make a
> great deal of a difference, and a substantial set of iterators are
> implemented in C, e.g. itertools, range, zip and friends.
>
> I'm not so sure about arithmetic operations. In Cython, we (currently?) do
> not optimistically replace these with more specific code (unless we know the
> types at compile time), because it complicates the generated C code and
> indirect jumps aren't all that slow that the benefit would be important.
> Savings are *much* higher when data can be unboxed, so much that the slight
> improvement for optimistic type guesses is totally dwarfed in Cython. I
> would expect that the return of investment is better when the types are
> actually known at runtime, as in your case.
>
Well, in my thesis I already hint at another improvement of the
existing design that can work on unboxed data as well (while still
being an interpreter.) I am eager to try this, but don't know how much
time I can spend on this (because there are several other research
projects I am actively involved in.) In my experience, this works very
well and you cannot actually report good speedups without
inline-caching arithmetic operations, simply because that's where all
JITs shine and most benchmarks don't reflect real world scenarios but
mathematics-inclined microbenchmarks. Also, if in the future compilers
(gcc and clang) will be able to inline the invoked functions, higher
speedups will be possible.


> 4) Regarding inlined object references, I would expect that it's much more
> worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST. I guess
> that this would be best helped by watching the module dict and the builtin
> dict internally and invalidating the interpreter state after changes (e.g.
> by providing a change counter in those dicts and checking that in the
> instructions that access them), and otherwise keeping the objects cached.
> Simply watching the dedicated instructions that change that state isn't
> enough as Python allows code to change these dicts directly through their
> dict interface.
>
Ok, I thought about something along these lines, too, but in the end,
decided to go with the current design, as it is easy and language
neutral (for my research I primarily chose Python as a demonstration
vehicle and none of these techniques is specific to Python.)
LOAD_GLOBAL pays off handsomly, and I think that I could easily make
it correct for all cases, if I knew the places that need to call
"invalidate_cache". Most of the LOAD_CONST instructions can be
replaced with the inlined-version (INCA_LOAD_CONST), and while I did
not do any benchmarks only on this, simply because they are very
frequently executed, even small optimizations pay off nicely. Another
point is that you can slim down the activation record of
PyEval_EvalFrameEx, because you don't need to use the "consts" field
anymore (similarly, you could probably eliminate the "names" and
"fastlocals" fields, if you find that most of the frequent and fast
cases are covered by the optimized instructions.)


> All in all, your list does sound like an interesting set of changes that are
> both understandable and worthwhile.
>
Thanks, I think so, too, which is why I wanted to integrate the
optimizations with CPython in the first place.


Thanks for the pointers to the dict stuff, I will take a look (IIRC,
Antoine pointed me in the same direction last year, but I think the
design was slightly different then),
--stefan

From jcea at jcea.es  Fri Sep  2 17:57:10 2011
From: jcea at jcea.es (Jesus Cea)
Date: Fri, 02 Sep 2011 17:57:10 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana Buildbot
Message-ID: <4E60FCD6.3090005@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

A single instance of buildbot in the OpenIndiana buildbot is eating
1.4GB of RAM and 3.8GB of SWAP and growing.

The build hangs or die with a "out of memory" error, eventually.

This is 100% reproducible. Everytime I force a build thru the buildbot
control page, I see this: takes huge memory and dies with an "out of
memory" or hangs.

I am allocating 4GB to the buildbots.

I think this is not normal. I am the only one seen such a memory
usage?. I haven't changed anything in my buildbots for months...

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmD81plgi5GaxT1NAQJIfQP+LvxG8jGDcfdsKB3omkM8fE/pA3q3yVQL
qVtSPQomCNB3hhhctEXnSFmDDekOTroCTpU9lYp6c9ZLmSCEGJx7bVW/53hk9ZJv
oMNwSHvQbrZy/eWuJAlSUqIl2oAmMP75RiDhL2eqBu/alhavK8oXCeDV7iG9EvZq
0RH9Weqr788=
=3jyf
-----END PGP SIGNATURE-----

From status at bugs.python.org  Fri Sep  2 18:07:27 2011
From: status at bugs.python.org (Python tracker)
Date: Fri,  2 Sep 2011 18:07:27 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20110902160727.B2C881CFD5@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-08-26 - 2011-09-02)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    2967 ( +4)
  closed 21701 (+36)
  total  24668 (+40)

Open issues with patches: 1283 


Issues opened (32)
==================

#12837: Patch for issue #12810 removed a valid check on socket ancilla
http://bugs.python.org/issue12837  reopened by brett.cannon

#12848: pickle.py treats 32bit lengths as signed, but _pickle.c as uns
http://bugs.python.org/issue12848  opened by pitrou

#12849: urllib2 headers issue
http://bugs.python.org/issue12849  opened by shubhojeet.ghosh

#12850: [PATCH] stm.atomic
http://bugs.python.org/issue12850  opened by arigo

#12851: ctypes: getbuffer() never provides strides
http://bugs.python.org/issue12851  opened by skrah

#12852: POSIX level issues in posixmodule.c on OpenBSD 5.0
http://bugs.python.org/issue12852  opened by rpointel

#12853: global name 'r' is not defined in upload.py
http://bugs.python.org/issue12853  opened by reowen

#12854: PyOS_Readline usage in tokenizer ignores sys.stdin/sys.stdout
http://bugs.python.org/issue12854  opened by Albert.Zeyer

#12855: linebreak sequences should be better documented
http://bugs.python.org/issue12855  opened by Matthew.Boehm

#12856: tempfile PRNG reuse between parent and child process
http://bugs.python.org/issue12856  opened by ferringb

#12857: Expose called function on frame object
http://bugs.python.org/issue12857  opened by eric.snow

#12858: crypt.mksalt: use ssl.RAND_pseudo_bytes() if available
http://bugs.python.org/issue12858  opened by haypo

#12860: http client attempts to send a readable object twice
http://bugs.python.org/issue12860  opened by langmartin

#12861: PyOS_Readline uses single lock
http://bugs.python.org/issue12861  opened by Albert.Zeyer

#12862: ConfigParser does not implement "comments need to be preceded 
http://bugs.python.org/issue12862  opened by DanielFortunov

#12863: py32 > Lib > xml.minidom > usage feedback > overrides
http://bugs.python.org/issue12863  opened by GPU.Group

#12864: 2to3 creates illegal code on import a.b inside a package
http://bugs.python.org/issue12864  opened by simohe

#12866: Want to submit our Audioop.c patch for 24bit audio
http://bugs.python.org/issue12866  opened by Peder.J??rgensen

#12869: PyOS_StdioReadline is printing the prompt on stderr
http://bugs.python.org/issue12869  opened by Albert.Zeyer

#12870: Regex object should have introspection methods
http://bugs.python.org/issue12870  opened by mattchaput

#12871: Disable sched_get_priority_min/max if Python is compiled	witho
http://bugs.python.org/issue12871  opened by haypo

#12872: --with-tsc crashes on ppc64
http://bugs.python.org/issue12872  opened by dmalcolm

#12873: 2to3 incorrectly handles multi-line imports from __future__
http://bugs.python.org/issue12873  opened by Arfrever

#12875: backport re.compile flags default value documentation
http://bugs.python.org/issue12875  opened by eli.bendersky

#12876: Make Test Error : ImportError: No module named _sha256
http://bugs.python.org/issue12876  opened by wah meng

#12878: io.StringIO doesn't provide a __dict__ field
http://bugs.python.org/issue12878  opened by ericp

#12880: ctypes: clearly document how structure bit fields are allocate
http://bugs.python.org/issue12880  opened by meadori

#12881: ctypes: segfault with large structure field names
http://bugs.python.org/issue12881  opened by meadori

#12882: mmap crash on Windows
http://bugs.python.org/issue12882  opened by itabhijitb

#12883: xml.sax.xmlreader.AttributesImpl allows empty string as attrib
http://bugs.python.org/issue12883  opened by Michael.Sulyaev

#12885: distutils.filelist.findall() fails on broken symlink in Py2.x
http://bugs.python.org/issue12885  opened by Alexander.Dutton

#12886: datetime.strptime parses input wrong
http://bugs.python.org/issue12886  opened by heidar.rafn


Most recent 15 issues with no replies (15)
==========================================

#12885: distutils.filelist.findall() fails on broken symlink in Py2.x
http://bugs.python.org/issue12885

#12883: xml.sax.xmlreader.AttributesImpl allows empty string as attrib
http://bugs.python.org/issue12883

#12881: ctypes: segfault with large structure field names
http://bugs.python.org/issue12881

#12880: ctypes: clearly document how structure bit fields are allocate
http://bugs.python.org/issue12880

#12873: 2to3 incorrectly handles multi-line imports from __future__
http://bugs.python.org/issue12873

#12872: --with-tsc crashes on ppc64
http://bugs.python.org/issue12872

#12869: PyOS_StdioReadline is printing the prompt on stderr
http://bugs.python.org/issue12869

#12866: Want to submit our Audioop.c patch for 24bit audio
http://bugs.python.org/issue12866

#12864: 2to3 creates illegal code on import a.b inside a package
http://bugs.python.org/issue12864

#12863: py32 > Lib > xml.minidom > usage feedback > overrides
http://bugs.python.org/issue12863

#12862: ConfigParser does not implement "comments need to be preceded 
http://bugs.python.org/issue12862

#12860: http client attempts to send a readable object twice
http://bugs.python.org/issue12860

#12858: crypt.mksalt: use ssl.RAND_pseudo_bytes() if available
http://bugs.python.org/issue12858

#12854: PyOS_Readline usage in tokenizer ignores sys.stdin/sys.stdout
http://bugs.python.org/issue12854

#12851: ctypes: getbuffer() never provides strides
http://bugs.python.org/issue12851


Most recent 15 issues waiting for review (15)
=============================================

#12872: --with-tsc crashes on ppc64
http://bugs.python.org/issue12872

#12857: Expose called function on frame object
http://bugs.python.org/issue12857

#12856: tempfile PRNG reuse between parent and child process
http://bugs.python.org/issue12856

#12855: linebreak sequences should be better documented
http://bugs.python.org/issue12855

#12852: POSIX level issues in posixmodule.c on OpenBSD 5.0
http://bugs.python.org/issue12852

#12850: [PATCH] stm.atomic
http://bugs.python.org/issue12850

#12842: Docs: first parameter of tp_richcompare() always has the corre
http://bugs.python.org/issue12842

#12841: Incorrect tarfile.py extraction
http://bugs.python.org/issue12841

#12837: Patch for issue #12810 removed a valid check on socket ancilla
http://bugs.python.org/issue12837

#12832: The documentation for the print function should explain/point 
http://bugs.python.org/issue12832

#12822: NewGIL should use CLOCK_MONOTONIC if possible.
http://bugs.python.org/issue12822

#12820: Tests for Lib/xml/dom/minicompat.py
http://bugs.python.org/issue12820

#12819: PEP 393 - Flexible Unicode String Representation
http://bugs.python.org/issue12819

#12818: email.utils.formataddr incorrectly quotes parens inside quoted
http://bugs.python.org/issue12818

#12817: test_multiprocessing: io.BytesIO() requires bytearray buffers
http://bugs.python.org/issue12817


Top 10 most discussed issues (10)
=================================

#12852: POSIX level issues in posixmodule.c on OpenBSD 5.0
http://bugs.python.org/issue12852  15 msgs

#12736: Request for python casemapping functions to use full not simpl
http://bugs.python.org/issue12736  15 msgs

#2636: Adding a new regex module (compatible with re)
http://bugs.python.org/issue2636  14 msgs

#12855: linebreak sequences should be better documented
http://bugs.python.org/issue12855  10 msgs

#12729: Python lib re cannot handle Unicode properly due to narrow/wid
http://bugs.python.org/issue12729   9 msgs

#12850: [PATCH] stm.atomic
http://bugs.python.org/issue12850   9 msgs

#12735: request full Unicode collation support in std python library
http://bugs.python.org/issue12735   7 msgs

#12837: Patch for issue #12810 removed a valid check on socket ancilla
http://bugs.python.org/issue12837   7 msgs

#12841: Incorrect tarfile.py extraction
http://bugs.python.org/issue12841   6 msgs

#12876: Make Test Error : ImportError: No module named _sha256
http://bugs.python.org/issue12876   6 msgs


Issues closed (36)
==================

#6069: casting error from ctypes array to structure
http://bugs.python.org/issue6069  closed by meadori

#6980: fix ctypes build failure on armel-linux-gnueabi with -mfloat-a
http://bugs.python.org/issue6980  closed by meadori

#8296: multiprocessing.Pool hangs when issuing KeyboardInterrupt
http://bugs.python.org/issue8296  closed by vinay.sajip

#8409: gettext should honor $LOCPATH environment variable
http://bugs.python.org/issue8409  closed by barry

#9651: ctypes crash when writing zerolength string buffer to file
http://bugs.python.org/issue9651  closed by amaury.forgeotdarc

#9923: mailcap module may not work on non-POSIX platforms if MAILCAPS
http://bugs.python.org/issue9923  closed by ncoghlan

#10086: test_sysconfig failure when prefix matches /site
http://bugs.python.org/issue10086  closed by eric.araujo

#11241: ctypes: subclassing an already subclassed ArrayType generates 
http://bugs.python.org/issue11241  closed by amaury.forgeotdarc

#11564: pickle not 64-bit ready
http://bugs.python.org/issue11564  closed by pitrou

#11879: TarFile.chown: should use TarInfo.uid if user lookup fails
http://bugs.python.org/issue11879  closed by lars.gustaebel

#11920: ctypes: Strange bitfield structure sizing issue
http://bugs.python.org/issue11920  closed by meadori

#12195: Little documentation of annotations
http://bugs.python.org/issue12195  closed by rhettinger

#12287: ossaudiodev: stack corruption with FD >= FD_SETSIZE
http://bugs.python.org/issue12287  closed by neologix

#12472: Build failure on IRIX
http://bugs.python.org/issue12472  closed by neologix

#12494: subprocess: check_output() doesn't close pipes on error
http://bugs.python.org/issue12494  closed by haypo

#12636: IDLE ignores -*- coding -*- with -r option
http://bugs.python.org/issue12636  closed by haypo

#12720: Expose linux extended filesystem attributes
http://bugs.python.org/issue12720  closed by python-dev

#12742: Add support for CESU-8 encoding
http://bugs.python.org/issue12742  closed by ezio.melotti

#12793: allow filters in os.walk
http://bugs.python.org/issue12793  closed by rhettinger

#12802: Windows error code 267 should be mapped to ENOTDIR, not EINVAL
http://bugs.python.org/issue12802  closed by pitrou

#12829: pyexpat segmentation fault caused by multiple calls to Parse()
http://bugs.python.org/issue12829  closed by ned.deily

#12835: Missing SSLSocket.sendmsg() wrapper allows programs to send un
http://bugs.python.org/issue12835  closed by ncoghlan

#12839: zlibmodule cannot handle Z_VERSION_ERROR zlib error
http://bugs.python.org/issue12839  closed by nadeem.vawda

#12843: file object read* methods in append mode overflows
http://bugs.python.org/issue12843  closed by amaury.forgeotdarc

#12846: unicodedata.normalize turkish letter problem
http://bugs.python.org/issue12846  closed by terry.reedy

#12847: crash with negative PUT in pickle
http://bugs.python.org/issue12847  closed by pitrou

#12859: readline implementation doesn't release the GIL
http://bugs.python.org/issue12859  closed by Albert.Zeyer

#12865: import SimpleHTTPServer
http://bugs.python.org/issue12865  closed by amaury.forgeotdarc

#12867: linecache.getline() Returning Error
http://bugs.python.org/issue12867  closed by ned.deily

#12868: test_faulthandler.test_stack_overflow() failed on OpenBSD
http://bugs.python.org/issue12868  closed by neologix

#12874: Rearrange descriptions of builtin types in the Library referen
http://bugs.python.org/issue12874  closed by ezio.melotti

#12877: Popen(...).stdout.seek(...) throws "Illegal seek"
http://bugs.python.org/issue12877  closed by haypo

#12879: "method-wrapper" objects are difficult to inspect
http://bugs.python.org/issue12879  closed by benjamin.peterson

#12884: Re
http://bugs.python.org/issue12884  closed by ezio.melotti

#1462440: socket and threading: udp multicast setsockopt fails
http://bugs.python.org/issue1462440  closed by neologix

#10946: bdist doesn???t pass --skip-build on to subcommands
http://bugs.python.org/issue10946  closed by eric.araujo

From zvezdan at zope.com  Fri Sep  2 18:01:56 2011
From: zvezdan at zope.com (Zvezdan Petkovic)
Date: Fri, 2 Sep 2011 12:01:56 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E6031C4.7030809@pearwood.info>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<CAP7+vJKbDCqcc2dDqXs5+NJc=UP7C3CBJAfnQRgOLRZ=30pX3A@mail.gmail.com>	<CACac1F_jqN06stQ4HK9j=uZ4XtT6=rTbgZXOb4sqndAk-xPG6Q@mail.gmail.com>	<4E576793.2010203@v.loewis.de>
	<4E5824E1.9010101@udel.edu>	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>	<4E5869C2.2040008@udel.edu>	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>	<20110829141440.2e2178c6@pitrou.net>	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>	<1314724786.3554.1.camel@localhost.localdomain>	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<1314893027.3617.12.camel@localhost.localdomain> <4E6031C4.
	7030809@pearwood.info>
Message-ID: <42D586D2-A78B-4A84-98E0-5F43172A61D7@zope.com>

On Sep 1, 2011, at 9:30 PM, Steven D'Aprano wrote:

> Antoine Pitrou wrote:
>> Le jeudi 01 septembre 2011 ? 08:45 -0700, Guido van Rossum a ?crit :
>>> This is definitely thought of as a separate
>>> mark added to the e; ? is not a new letter. I have a feeling it's the same way for the French and Germans, but I really don't know.
>>> (Antoine? Georg?)
>> Indeed, they are not separate "letters" (they are considered the same in lexicographic order, and the French alphabet has 26 letters).
> 
> 
> On the other hand, the same doesn't necessarily apply to other languages. (At least according to Wikipedia.)
> 
> http://en.wikipedia.org/wiki/Diacritic#Languages_with_letters_containing_diacritics

For example, in Serbo-Croatian (Serbian, Croatian, Bosnian, Montenegrin, if you want), each of the following letters represent one distinct sound of the language.  In Serbian Cyrillic alphabet, they are distinct symbols. In Latin alphabet, the corresponding letters are formed with diacritics because the alphabet is shorter.

	Letter	Approximate pronunciation	Cyrillic
	------	-------------------------	--------
	?	tch in butcher			?
	?	ch in chapter, but softer	?
	d?	j in jump			?
	?	j in juice			?
	?	sh in ship			?
	?	s in pleasure, measure, ...	?

The language has 30 sounds and the corresponding 30 letters.
See the count of the letters in these tables:
- http://hr.wikipedia.org/wiki/Hrvatska_abeceda
- http://sr.wikipedia.org/wiki/??????

Diacritics are used in grammar books and in print (occasionally) to distinguish between four different accents of the language:

	- long rising: ?,
	- short rising: ?,
	- long falling: ? (inverted breve, *not* a circumflex ?), and
	- short falling: ?,

especially when the words that use the same sounds -- thus, spelled with the same letters -- are next  to each other.  The accents are used to change the intonation of the whole word, not to change the sound of the letter.

For example: "Ja sam s?m." -- "I am alone."

Both words "sam" contain the "a" sound, but the first one is pronounced short.  As a form of the verb "to be" it's an enclitic that takes the accent of the preceding word "I".  The second one is pronounced with a long falling accent.

The macron can be used to indicate the length of a *non-stressed* vowel,
e.g. ?, but is usually unnecessary in standard print.

Many languages use alphabets that are not suitable to their sound system.  The speakers of these languages adapted alphabets to their sounds either by using letters with distinct shapes (Cyrillic letters above), or adding diacritics to an existing shape (Latin letters above).  

The new combined form is a distinct letter.  These letters have separate sections in dictionaries and a sorting order.

The diacritics that indicate an accent or length are used only above vowels and do *not* represent distinct letters.

Best regards,

	Zvezdan Petkovi?

P.S. Since I live in the USA, the last letter of my surname is *wrongly* spelled (? -> c) and pronounced (ch -> k) most of the time.  :-)


From stefan_ml at behnel.de  Fri Sep  2 19:12:21 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 02 Sep 2011 19:12:21 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0kN0nx_buHVMayoUxCLaue-9CVeN67y3qmsS=vt1rQjDA@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>	<20110830193806.0d718a56@pitrou.net>	<CA+j1x0n8iJhmvGC82-hhA9zGT_Y58H_DXaqoG-6jQSsVDR=BQQ@mail.gmail.com>	<CAP7+vJ+fpR2U-Co9NssmL3a53J4JbFM2pLrtRr84Arb_W5y-Dw@mail.gmail.com>	<CA+j1x0mBKUwj41CgCgsw55edwRvm-+nuZiJUywmQrRkQyfqzAg@mail.gmail.com>	<CAP7+vJ+OJAu115PtPAGMUByoqDAfcRcfk+Er1HAPEA13xYTxKw@mail.gmail.com>	<CA+j1x0m9Hv8tSb-83Q99xPFfswAckxX_Nob2GO-yc=ePX9ig-A@mail.gmail.com>	<CAP7+vJLB1f0A3WL4Q=GWXNrvXfwnoxdxBP2tHj5BkRTKihvxqg@mail.gmail.com>	<CA+j1x0k_nWG4SZeu7PnEuXv9X8r7xg17L0=bXpuiRkDrF101Gw@mail.gmail.com>	<CAP7v7k4woQ7huJDHbkxpTqjVX+Z=eZW6x1axMvNbzq59TnE-Ww@mail.gmail.com>	<112D04D2-DC83-4383-9C06-96208811507F@twistedmatrix.com>	<CAP7+vJ++iZUiQvzyR28aH9ugwPQPcnOcUC8rih4ebG9R8B0pmA@mail.gmail.com>	<CA+j1x0m4d9Oi2at6n7_Z8eZsSLfKEHiHfvFjSNFpXfJH5KOuGA@mail.gmail.com>	<j3q6ng$qq2$1@dough.gmane.org>
	<CA+j1x0kN0nx_buHVMayoUxCLaue-9CVeN67y3qmsS=vt1rQjDA@mail.gmail.com>
Message-ID: <j3r2pl$86l$1@dough.gmane.org>

stefan brunthaler, 02.09.2011 17:55:
>> 4) Regarding inlined object references, I would expect that it's much more
>> worthwhile to speed up LOAD_GLOBAL and LOAD_NAME than LOAD_CONST. I guess
>> that this would be best helped by watching the module dict and the builtin
>> dict internally and invalidating the interpreter state after changes (e.g.
>> by providing a change counter in those dicts and checking that in the
>> instructions that access them), and otherwise keeping the objects cached.
>> Simply watching the dedicated instructions that change that state isn't
>> enough as Python allows code to change these dicts directly through their
>> dict interface.
>  [...]
> Thanks for the pointers to the dict stuff, I will take a look (IIRC,
> Antoine pointed me in the same direction last year, but I think the
> design was slightly different then),

Not unlikely, Antoine tends to know the internals pretty well.

The Cython project has been (hand wavingly) thinking about this also: 
implement our own module type with its own __setattr__ (and dict proxy) in 
order to speed up access to the globals in the *very* likely case that they 
rarely or never change after module initialisation time and that most 
critical code accesses them read-only from within functions. If it turns 
out that this makes sense for CPython in general, it wouldn't be a bad idea 
to join forces at some point in order to make this readily usable for both 
sides.

Stefan


From jcea at jcea.es  Fri Sep  2 19:53:37 2011
From: jcea at jcea.es (Jesus Cea)
Date: Fri, 02 Sep 2011 19:53:37 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <4E60FCD6.3090005@jcea.es>
References: <4E60FCD6.3090005@jcea.es>
Message-ID: <4E611821.9050108@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/09/11 17:57, Jesus Cea wrote:
> The build hangs or die with a "out of memory" error, eventually.

A simple "make test" with python not compiled with "pydebug" and
skipping all the optional tests (like zip64) is taking up to 300MB of
RAM. Python 2.7 branch, current tip.

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmEYIZlgi5GaxT1NAQK79gP/aRyMqgEE7uScYtrZzPqs0ZSpGnVM8sBi
RbNEN3cB/s6Oe/UVIo4vinaDnXXYSOM5qtqghUl5Cnx+wiiK2cL8iIv/YzZbjT9s
U8QELEkol8lpjAVPEO/rSylZ5kvsmdjkM2mU6NOwiLGw+mmbbgqpmdAU14p+sqSO
2xFJElgOHuM=
=YA0J
-----END PGP SIGNATURE-----

From solipsis at pitrou.net  Fri Sep  2 20:14:15 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 2 Sep 2011 20:14:15 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
References: <4E60FCD6.3090005@jcea.es>
	<4E611821.9050108@jcea.es>
Message-ID: <20110902201415.773da7d6@pitrou.net>

On Fri, 02 Sep 2011 19:53:37 +0200
Jesus Cea <jcea at jcea.es> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 02/09/11 17:57, Jesus Cea wrote:
> > The build hangs or die with a "out of memory" error, eventually.
> 
> A simple "make test" with python not compiled with "pydebug" and
> skipping all the optional tests (like zip64) is taking up to 300MB of
> RAM. Python 2.7 branch, current tip.

Can you tell if it's something recent or it has always been like that?

Regards

Antoine.


From tseaver at palladion.com  Fri Sep  2 20:22:04 2011
From: tseaver at palladion.com (Tres Seaver)
Date: Fri, 02 Sep 2011 14:22:04 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314903285.3617.31.camel@localhost.localdomain>
	<j3olg4$34s$1@dough.gmane.org>
	<87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <j3r6sc$37m$1@dough.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/01/2011 11:59 PM, Stephen J. Turnbull wrote:
> Tres Seaver writes:
> 
>> FWIW, I was taught that Spanish had 30 letters in the alfabeto:
>> the '?', plus 'ch', 'll', and 'rr' were all considered distinct
>> characters.
> 
> That was always a Castellano vs. Americano issue, IIRC.  As I wrote, 
> Mr. Gonzalez was Castellano.

- From a casual web search, it looks as though the RAE didn't legislate
"letterness" away from the digraphs (as I learned them) until 1994
(about 25 years after I learned the 30-letter alfabeto).

> I believe that the deprecation of the digraphs as separate letters 
> occurred as the telephone became widely used in Spain, and the 
> telephone company demanded an official proclamation from whatever 
> Ministry is responsible for culture that it was OK to treat the 
> digraphs as two letters (specifically, to collate them that way), so 
> that they could use the programs that came with the OS.
> 
> So this stuff is not merely variant by culture, but also by
> economics and politics. :-/

Lovely. :)


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5hHswACgkQ+gerLs4ltQ7m9ACeOJZRgjcm9pd0Rnry26zP0I3t
53cAoLv78VD5eIdbjvboLaysoeREIp1t
=0PuR
-----END PGP SIGNATURE-----


From fijall at gmail.com  Fri Sep  2 20:42:07 2011
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Fri, 2 Sep 2011 20:42:07 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CA+j1x0nDJUyyZ=7wmTi3jHZArwpXtuqfT9yLDVCV-vhj602WUw@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<201108292357.36628.victor.stinner@haypocalc.com>
	<CA+j1x0nDJUyyZ=7wmTi3jHZArwpXtuqfT9yLDVCV-vhj602WUw@mail.gmail.com>
Message-ID: <CAK5idxQZFVZ7oPKFdbifzfp=_QjNbtTUcGTpPs3xs-rTCjyRZQ@mail.gmail.com>

>
> For a comparative real world benchmark I tested Martin von Loewis'
> django port (there are not that many meaningful Python 3 real world
> benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably
> well, US got a speedup of 1.35 on this benchmark. I just checked that
> pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures
> seem to be not working currently or *really* fast...), but I cannot
> tell directly how that relates to speedups (it just says "less is
> better" and I did not quickly find an explanation).
> Since I did this benchmark last year, I have spent more time
> investigating this benchmark and found that I could do better, but I
> would have to guess as to how much (An interesting aside though: on
> this benchmark, the executable never grew on more than 5 megs of
> memory usage, exactly like the vanilla Python 3 interpreter.)
>

PyPy is ~12x faster on the django benchmark FYI

From stefan_ml at behnel.de  Fri Sep  2 21:20:57 2011
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 02 Sep 2011 21:20:57 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <CAK5idxQZFVZ7oPKFdbifzfp=_QjNbtTUcGTpPs3xs-rTCjyRZQ@mail.gmail.com>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>	<201108292357.36628.victor.stinner@haypocalc.com>	<CA+j1x0nDJUyyZ=7wmTi3jHZArwpXtuqfT9yLDVCV-vhj602WUw@mail.gmail.com>
	<CAK5idxQZFVZ7oPKFdbifzfp=_QjNbtTUcGTpPs3xs-rTCjyRZQ@mail.gmail.com>
Message-ID: <j3raaq$ruu$1@dough.gmane.org>

Maciej Fijalkowski, 02.09.2011 20:42:
>> For a comparative real world benchmark I tested Martin von Loewis'
>> django port (there are not that many meaningful Python 3 real world
>> benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably
>> well, US got a speedup of 1.35 on this benchmark. I just checked that
>> pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures
>> seem to be not working currently or *really* fast...), but I cannot
>> tell directly how that relates to speedups (it just says "less is
>> better" and I did not quickly find an explanation).
>
> PyPy is ~12x faster on the django benchmark FYI

FYI, there's a recent thread up on the pypy ML where someone is complaining 
about PyPy being substantially slower than CPython when running Django on 
top of SQLite. Also note that PyPy doesn't implement Py3 yet, so the 
benchmark results are not comparable anyway.

As usual, benchmark results depend on what you do in your benchmarks.

Stefan


From fijall at gmail.com  Fri Sep  2 21:59:21 2011
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Fri, 2 Sep 2011 21:59:21 +0200
Subject: [Python-Dev] Python 3 optimizations continued...
In-Reply-To: <j3raaq$ruu$1@dough.gmane.org>
References: <CA+j1x0kpXk7B7SB=QVRF=TGSP9b3FPrwom37eGEeZoUS2XKz_A@mail.gmail.com>
	<201108292357.36628.victor.stinner@haypocalc.com>
	<CA+j1x0nDJUyyZ=7wmTi3jHZArwpXtuqfT9yLDVCV-vhj602WUw@mail.gmail.com>
	<CAK5idxQZFVZ7oPKFdbifzfp=_QjNbtTUcGTpPs3xs-rTCjyRZQ@mail.gmail.com>
	<j3raaq$ruu$1@dough.gmane.org>
Message-ID: <CAK5idxSBCtvv1Mi16g=4Rb-8MLZCL2yNifBJJ-XCixSe3ysFFA@mail.gmail.com>

On Fri, Sep 2, 2011 at 9:20 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Maciej Fijalkowski, 02.09.2011 20:42:
>>>
>>> For a comparative real world benchmark I tested Martin von Loewis'
>>> django port (there are not that many meaningful Python 3 real world
>>> benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably
>>> well, US got a speedup of 1.35 on this benchmark. I just checked that
>>> pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures
>>> seem to be not working currently or *really* fast...), but I cannot
>>> tell directly how that relates to speedups (it just says "less is
>>> better" and I did not quickly find an explanation).
>>
>> PyPy is ~12x faster on the django benchmark FYI
>
> FYI, there's a recent thread up on the pypy ML where someone is complaining
> about PyPy being substantially slower than CPython when running Django on
> top of SQLite. Also note that PyPy doesn't implement Py3 yet, so the
> benchmark results are not comparable anyway.

Yes, sqlite is slow. It's also much faster in trunk than in 1.6 and
there is an open ticket about it :)

The "django" benchmark is just templating, so it does not involve a database.

From greg.ewing at canterbury.ac.nz  Sat Sep  3 01:46:58 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 03 Sep 2011 11:46:58 +1200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j3pte7$v5h$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314903285.3617.31.camel@localhost.localdomain>
	<j3olg4$34s$1@dough.gmane.org>
	<87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j3pte7$v5h$1@dough.gmane.org>
Message-ID: <4E616AF2.3070408@canterbury.ac.nz>

Terry Reedy wrote:

> While it has apparently been criticized as 'conservative' (which is well 
> ought to be), it has been rather progressive in promoting changes such 
> as 'ph' to 'f' (fisica, fone) and dropping silent 'p' in leading 'psi' 
> (sicologia) and silent 's' in leading 'sci' (ciencia).

I find it curious that pronunciation always seems to take
precedence over spelling in campaigns like this. Nowadays,
especially with the internet increasingly taking over from
personal interaction, we probably see words written a lot
more often than we hear them spoken. Why shouldn't we
change the pronunciation to match the spelling rather than
the other way around?

-- 
Greg


From stephen at xemacs.org  Sat Sep  3 06:17:24 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 03 Sep 2011 13:17:24 +0900
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <4E616AF2.3070408@canterbury.ac.nz>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<4E576793.2010203@v.loewis.de> <4E5824E1.9010101@udel.edu>
	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>
	<4E5869C2.2040008@udel.edu>
	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>
	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110829141440.2e2178c6@pitrou.net>
	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314724786.3554.1.camel@localhost.localdomain>
	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>
	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>
	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>
	<87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp>
	<1314903285.3617.31.camel@localhost.localdomain>
	<j3olg4$34s$1@dough.gmane.org>
	<87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j3pte7$v5h$1@dough.gmane.org> <4E616AF2.3070408@canterbury.ac.nz>
Message-ID: <87ehzyjm3v.fsf@uwakimon.sk.tsukuba.ac.jp>

Greg Ewing writes:

 > I find it curious that pronunciation always seems to take
 > precedence over spelling in campaigns like this. Nowadays,
 > especially with the internet increasingly taking over from
 > personal interaction, we probably see words written a lot
 > more often than we hear them spoken. Why shouldn't we
 > change the pronunciation to match the spelling rather than
 > the other way around?

Because 90% of all people move their lips when reading. :-)

More seriously, because almost nobody learns to read before learning
to understand spoken language.  Aural language is more primitive than
written language.


From georg at python.org  Sun Sep  4 22:21:50 2011
From: georg at python.org (Georg Brandl)
Date: Sun, 04 Sep 2011 22:21:50 +0200
Subject: [Python-Dev] [RELEASED] Python 3.2.2
Message-ID: <4E63DDDE.2040108@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On behalf of the Python development team, I'm happy to announce the
Python 3.2.2 maintenance release.

Python 3.2.2 mainly fixes `a regression <http://bugs.python.org/12576>`_ in the
``urllib.request`` module that prevented opening many HTTP resources correctly
with Python 3.2.1.

Python 3.2 is a continuation of the efforts to improve and stabilize the
Python 3.x line.  Since the final release of Python 2.7, the 2.x line
will only receive bugfixes, and new features are developed for 3.x only.

Since PEP 3003, the Moratorium on Language Changes, is in effect, there
are no changes in Python's syntax and built-in types in Python 3.2.
Development efforts concentrated on the standard library and support for
porting code to Python 3.  Highlights are:

* numerous improvements to the unittest module
* PEP 3147, support for .pyc repository directories
* PEP 3149, support for version tagged dynamic libraries
* PEP 3148, a new futures library for concurrent programming
* PEP 384, a stable ABI for extension modules
* PEP 391, dictionary-based logging configuration
* an overhauled GIL implementation that reduces contention
* an extended email package that handles bytes messages
* a much improved ssl module with support for SSL contexts and certificate
  hostname matching
* a sysconfig module to access configuration information
* additions to the shutil module, among them archive file support
* many enhancements to configparser, among them mapping protocol support
* improvements to pdb, the Python debugger
* countless fixes regarding bytes/string issues; among them full support
  for a bytes environment (filenames, environment variables)
* many consistency and behavior fixes for numeric operations

For a more extensive list of changes in 3.2, see

    http://docs.python.org/3.2/whatsnew/3.2.html

To download Python 3.2 visit:

    http://www.python.org/download/releases/3.2/

Please consider trying Python 3.2 with your code and reporting any bugs
you may notice to:

    http://bugs.python.org/


Enjoy!

- --
Georg Brandl, Release Manager
georg at python.org
(on behalf of the entire python-dev team and 3.2's contributors)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iEYEARECAAYFAk5j3d4ACgkQN9GcIYhpnLA2BACeLZ8nSdVOoxlJw4DnbM42neeA
fwAAoKTHetXsVxrEfvCWSorUhoJ083kZ
=5Wm1
-----END PGP SIGNATURE-----

From hagen at zhuliguan.net  Mon Sep  5 04:19:04 2011
From: hagen at zhuliguan.net (=?ISO-8859-1?Q?Hagen_F=FCrstenau?=)
Date: Sun, 04 Sep 2011 22:19:04 -0400
Subject: [Python-Dev] [RELEASED] Python 3.2.2
In-Reply-To: <4E63DDDE.2040108@python.org>
References: <4E63DDDE.2040108@python.org>
Message-ID: <j41bge$1hp$1@dough.gmane.org>

> To download Python 3.2 visit:
> 
>     http://www.python.org/download/releases/3.2/

It's a bit confusing that the download link is to 3.2 and not 3.2.2.

Cheers,
Hagen


From tjreedy at udel.edu  Mon Sep  5 05:41:20 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 04 Sep 2011 23:41:20 -0400
Subject: [Python-Dev] [RELEASED] Python 3.2.2
In-Reply-To: <4E63DDDE.2040108@python.org>
References: <4E63DDDE.2040108@python.org>
Message-ID: <j41gei$pvi$1@dough.gmane.org>

On 9/4/2011 4:21 PM, Georg Brandl wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On behalf of the Python development team, I'm happy to announce the
> Python 3.2.2 maintenance release.

> To download Python 3.2 visit:
>
>      http://www.python.org/download/releases/3.2/

To download 3.2.2 visit:
http://www.python.org/download/releases/3.2.2/

-- 
Terry Jan Reedy


From g.brandl at gmx.net  Mon Sep  5 08:36:44 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 05 Sep 2011 08:36:44 +0200
Subject: [Python-Dev] [RELEASED] Python 3.2.2
In-Reply-To: <j41bge$1hp$1@dough.gmane.org>
References: <4E63DDDE.2040108@python.org> <j41bge$1hp$1@dough.gmane.org>
Message-ID: <j41qjd$akt$1@dough.gmane.org>

Am 05.09.2011 04:19, schrieb Hagen F?rstenau:
>> To download Python 3.2 visit:
>> 
>>     http://www.python.org/download/releases/3.2/
> 
> It's a bit confusing that the download link is to 3.2 and not 3.2.2.

Indeed, sorry.

Georg


From fuzzyman at voidspace.org.uk  Mon Sep  5 13:56:09 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Mon, 5 Sep 2011 12:56:09 +0100
Subject: [Python-Dev] Maintenance burden of str.swapcase
Message-ID: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>

Hey all,

A while ago there was a discussion of the value of apis like str.swapcase, and it was suggested that even though it was acknowledged to be useless the effort of deprecating and removing it was thought to be more than the value in removing it. 

Earlier this year I was at a pypy sprint helping to work on Python 2.7 compatibility. The bytearray type has much of the string interface, including swapcase? So there was effort to implement this method with the correct semantics for pypy. Doubtless the same has been true for IronPython, and will also be true for Jython.

Whilst it is too late for Python 2.x, it *is* (in my opinion) worth removing unused and unneeded APIs. Even if the effort to remove them is more than any effort saved on the part of users it helps other implementations down the road that no longer need to provide these APIs.

All the best,

Michael Foord

-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110905/331638af/attachment.html>

From merwok at netwok.org  Mon Sep  5 16:54:04 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Mon, 05 Sep 2011 16:54:04 +0200
Subject: [Python-Dev] [Python-checkins] cpython (3.2): #5301: add
 image/vnd.microsoft.icon (.ico) MIME type
In-Reply-To: <CAPdtAj2SJ-8zt4mptTjzLkhN0a7jPQLpuhM7=9ZPSyL5vLD0vQ@mail.gmail.com>
References: <E1Qutsa-0001pg-L5@dinsdale.python.org>	<4E50BF15.8020502@netwok.org>
	<CAPdtAj2SJ-8zt4mptTjzLkhN0a7jPQLpuhM7=9ZPSyL5vLD0vQ@mail.gmail.com>
Message-ID: <4E64E28C.7000908@netwok.org>

Hi,

Le 21/08/2011 11:09, Sandro Tosi a ?crit :
> On Sun, Aug 21, 2011 at 10:17, ?ric Araujo <merwok at netwok.org> wrote:
>> However small the commit was, I think it still was a feature request, so
>> I wonder if it was appropriate for the stable versions.
> 
> I can see your point: the reason I committed it also on the stable
> branches is that .ico are already out there (since a long time) and
> they were currently not recognized. I can call it a bug.
> 
> Anyhow, if it was not appropriate, just tell me and I'll revert on 2.7
> and 3.2 .

It should be reverted, yes, at least in 2.7.  Apparently Georg has
accepted and released the fix for 3.2.2.

Regards

From merwok at netwok.org  Mon Sep  5 19:01:16 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Mon, 05 Sep 2011 19:01:16 +0200
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>	<4E576793.2010203@v.loewis.de>
	<4E5824E1.9010101@udel.edu>	<CAP7+vJ+1HzcGG5BXzjOtmz_1g+Rk11Qn19iS1yEjAVxpkRio-Q@mail.gmail.com>	<4E5869C2.2040008@udel.edu>	<8420B962-0F4B-45D3-9B1A-0C5C3AD3676E@gmail.com>	<87ippglw6b.fsf@uwakimon.sk.tsukuba.ac.jp>	<20110829141440.2e2178c6@pitrou.net>	<874o0ylsq6.fsf@uwakimon.sk.tsukuba.ac.jp>	<1314724786.3554.1.camel@localhost.localdomain>	<8739gil27m.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJJ0qpBMO-T5o=CP-F5ij37fmTLUqSdMGdJ4kGLHV8FBYQ@mail.gmail.com>	<87y5yajewn.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKTTD+WAtcmRa-a=R1NNFW6WCNNKCK+5=OQrkxGhXuZ9g@mail.gmail.com>	<87r540ka68.fsf@uwakimon.sk.tsukuba.ac.jp>	<CAP7+vJKD8g5qhP5ZY3HC87GuT1vVA89+BGXU5i_s-ehp2-yU=g@mail.gmail.com>	<87ei00jdof.fsf@uwakimon.sk.tsukuba.ac.jp>	<1314903285.3617.31.camel@localhost.localdomain>	<j3olg4$34s$1@dough.gmane.org>
	<87bov3k322.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4E65005C.5070104@netwok.org>

Le 02/09/2011 05:59, Stephen J. Turnbull a ?crit :
> I believe that the deprecation of the digraphs as separate letters
> occurred as the telephone became widely used in Spain, and the
> telephone company demanded an official proclamation from whatever
> Ministry is responsible for culture that it was OK to treat the
> digraphs as two letters (specifically, to collate them that way), so
> that they could use the programs that came with the OS.
> 
> So this stuff is not merely variant by culture, but also by economics
> and politics. :-/

That is a truth for language matters and linguistics, as well as in
other domains and sciences.

Cheers

From ncoghlan at gmail.com  Tue Sep  6 02:25:58 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 6 Sep 2011 10:25:58 +1000
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix
 PyUnicode_AsWideCharString() doc: size doesn't contain the null character
In-Reply-To: <E1R0j70-0003rc-8g@dinsdale.python.org>
References: <E1R0j70-0003rc-8g@dinsdale.python.org>
Message-ID: <CADiSq7eqm2CXKVeZpwo5+X3vm07+a+xGuQE-65nvL2+8t8gaDA@mail.gmail.com>

On Tue, Sep 6, 2011 at 10:01 AM, victor.stinner
<python-checkins at python.org> wrote:
> Fix also spelling of the null character.

While these cases are legitimately changed to 'null' (since they're
lowercase descriptions of the character), I figure it's worth
mentioning again that the ASCII name for '\0' actually *is* NUL (i.e.
only one 'L'). Strange, but true [1].

Cheers,
Nick.

[1] https://secure.wikimedia.org/wikipedia/en/wiki/ASCII

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From jcea at jcea.es  Tue Sep  6 06:09:01 2011
From: jcea at jcea.es (Jesus Cea)
Date: Tue, 06 Sep 2011 06:09:01 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <20110902201415.773da7d6@pitrou.net>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net>
Message-ID: <4E659CDD.8090900@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/09/11 20:14, Antoine Pitrou wrote:
> On Fri, 02 Sep 2011 19:53:37 +0200 Jesus Cea <jcea at jcea.es> wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> 
>> On 02/09/11 17:57, Jesus Cea wrote:
>>> The build hangs or die with a "out of memory" error,
>>> eventually.
>> 
>> A simple "make test" with python not compiled with "pydebug" and 
>> skipping all the optional tests (like zip64) is taking up to
>> 300MB of RAM. Python 2.7 branch, current tip.
> 
> Can you tell if it's something recent or it has always been like
> that?

I can't tell. My host has restricted me recently to 4GB RAM max (no
swap), and the buildbot is failing now, but I don't know if using so
much memory is something recent or not.

Previously I could use up to 32GB of RAM.

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmWc3Zlgi5GaxT1NAQKhGgP/U8f/NEk2WeNdEngasEDFxX1xSEzJMddo
qIv7XkGXc93LNdGpqaIzNgW2d5NX3i7es0U5NrDtJVa0BTDLorKFN+zV6RpInZUO
eQR65ZYn6Ld1xioyrb74v5vZq7HXcONhyVPcmXufRHkzkZ+kTnybvyc60plZEN5n
NyHJkl7gNcU=
=iNH7
-----END PGP SIGNATURE-----

From ncoghlan at gmail.com  Tue Sep  6 06:19:27 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 6 Sep 2011 14:19:27 +1000
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <4E659CDD.8090900@jcea.es>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
Message-ID: <CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>

On Tue, Sep 6, 2011 at 2:09 PM, Jesus Cea <jcea at jcea.es> wrote:
>> Can you tell if it's something recent or it has always been like
>> that?
>
> I can't tell. My host has restricted me recently to 4GB RAM max (no
> swap), and the buildbot is failing now, but I don't know if using so
> much memory is something recent or not.
>
> Previously I could use up to 32GB of RAM.

Is it possible your buildbot is set up to run the bigmem tests? IIRC,
those would work correctly with 32 GB, but die a horrible death with
only 4 GB available.

Regards,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From jcea at jcea.es  Tue Sep  6 06:27:41 2011
From: jcea at jcea.es (Jesus Cea)
Date: Tue, 06 Sep 2011 06:27:41 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
Message-ID: <4E65A13D.9010805@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/09/11 06:19, Nick Coghlan wrote:
> Is it possible your buildbot is set up to run the bigmem tests?
> IIRC, those would work correctly with 32 GB, but die a horrible
> death with only 4 GB available.

How can I check that?.

I am seen multiple python processes, quite a few, each taking around
300MB of RAM.

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmWhPZlgi5GaxT1NAQLkFAP/YBJ5owdNdl2yiJMc8kVi4Ndjt5WK5aRa
DY24wZvQP/wY1gOjWKGceTm5Mkhds1Y3qWnP4nW8l1nQNxj+xAdqc5SUQcBHQRVo
5xtC+gQQ1HqDUS4FhAn+IgvlXtnoT0cTfgRO2G7k0ti89KN79aCR+q52TSOy0VCW
1Spv9ilP1Rk=
=Ffmz
-----END PGP SIGNATURE-----

From ncoghlan at gmail.com  Tue Sep  6 06:46:26 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 6 Sep 2011 14:46:26 +1000
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <4E65A13D.9010805@jcea.es>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
Message-ID: <CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>

On Tue, Sep 6, 2011 at 2:27 PM, Jesus Cea <jcea at jcea.es> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 06/09/11 06:19, Nick Coghlan wrote:
>> Is it possible your buildbot is set up to run the bigmem tests?
>> IIRC, those would work correctly with 32 GB, but die a horrible
>> death with only 4 GB available.
>
> How can I check that?.
>
> I am seen multiple python processes, quite a few, each taking around
> 300MB of RAM.

The test logs include the exact command that is executed:
http://www.python.org/dev/buildbot/all/builders/AMD64%20OpenIndiana%203.x/builds/1731/steps/test/logs/stdio

So it looks like you're just running the standard test resource (which
makes sense, since the bigmem tests would saturate your system with a
single process rather than multiple processes).

The server actually looks it may be in a generally unhappy state,
perhaps due to previous builds that failed without cleaning up after
themselves properly. How many python processes do you see hanging
around?

Regards,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From jcea at jcea.es  Tue Sep  6 06:59:26 2011
From: jcea at jcea.es (Jesus Cea)
Date: Tue, 06 Sep 2011 06:59:26 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
Message-ID: <4E65A8AE.10900@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/09/11 06:46, Nick Coghlan wrote:
> The server actually looks it may be in a generally unhappy state, 
> perhaps due to previous builds that failed without cleaning up
> after themselves properly. How many python processes do you see
> hanging around?

Just now:

"""
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU
PROCESS/NLWP
  4340 buildbot  366M  344M sleep    1    0   0:11:23 0.0% python/2
 10097 buildbot  366M 8096K sleep   25    0   0:00:00 0.0% python/1
 10099 buildbot  366M 8100K sleep   25    0   0:00:00 0.0% python/1
 10098 buildbot  366M 8108K sleep   26    0   0:00:00 0.0% python/1
 27698 buildbot  251M 5244K sleep    1    0   0:00:00 0.0% python/1
 27697 buildbot  251M   11M sleep    1    0   0:00:00 0.0% python/1
 27695 buildbot  251M 5852K sleep    1    0   0:00:00 0.0% python/1
 27694 buildbot  251M 5844K sleep    1    0   0:00:00 0.0% python/1
 27696 buildbot  251M 5884K sleep    1    0   0:00:00 0.0% python/1
 27693 buildbot  251M 5964K sleep    1    0   0:00:00 0.0% python/1
  9893 buildbot  202M  198M sleep    1    1   0:09:32 0.0% python/2
 14538 buildbot  200M 4700K sleep    1    1   0:00:00 0.0% python/1
 25971 buildbot  194M  189M sleep   10    0   0:11:22 0.0% python/2
  2616 buildbot  120M  114M sleep    1    0   0:06:38 0.0% python/47
 11204 buildbot  118M 5612K sleep    1    0   0:00:00 0.0% python/2
ZONEID    NPROC  SWAP   RSS MEMORY      TIME  CPU ZONE

    23       56 4073M 1632M    40%   0:39:38 0.0%
pythonbuildbot.uk.openindiana.org
"""

This particular build seems to have hang, usual result of running out
of memory. Note the SWAP usage of 4073MB, when my limit seems to be
4096MB.

The buildbot master process will kill this "hang" processes after the
usual timeout.

I have requested raising my memory limit to my host, with no effect so
far. Anyway, eating >4GB of RAM seems quite overkill.

Doing a "make test" manually I can see the python process doing the
test to eat more than 200MB of RAM, but it only launch a python
process, not a handful like the regular buildbot.

I have verified that the memory use is atribuible to the buildbot,
since if I kill the buildbot processes, my RAM+SWAP usage is negligible.

Thanking for helping me with this.

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmWorplgi5GaxT1NAQIb7QP/Y/Mr0RhhRTM1Rld7xKqNi77tcB0+p4CX
EZ0fViNr/NF6NibKMzowi0pr42iZ3dXN4/yRQgNsvGhfzTrpi+J3Z1GCg5vnqox3
jOC+DQ5IrZylLV+zH46K9j2UJ+4hvU3PWBZcGAt6iB4EVK1h8mvBBW08VeDoN5Cj
Nkqth694BcY=
=KAwa
-----END PGP SIGNATURE-----

From jcea at jcea.es  Tue Sep  6 07:02:13 2011
From: jcea at jcea.es (Jesus Cea)
Date: Tue, 06 Sep 2011 07:02:13 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <4E65A8AE.10900@jcea.es>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es>
Message-ID: <4E65A955.7000507@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/09/11 06:59, Jesus Cea wrote:
> Thanking for helping me with this.

BTW, it is 7AM in Spain now. I am going bed. I will check this thread
again tomorrow. Thanks for your time and effort. This is very
frustrating, moreover because it was working very well (with 32GB of
RAM... :-).


- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmWpVZlgi5GaxT1NAQIz/wP5AYXGp6DYf0Fpl0tNHx8sLNJXR8XSQFjf
YRoUvmo1Sh60eMU7yGsoyT2wvOTzU4rPgaWoFsaUELS/74rLMcmb567kKAJqpH7X
8BNmNSdRxYxMXixUrrwi25rYTEgz4ZenpV8tjkHR+wHhcCbBvKnDxcliJZkAxDAJ
mzlhdQvdPgI=
=9wQO
-----END PGP SIGNATURE-----

From jcea at jcea.es  Tue Sep  6 07:19:20 2011
From: jcea at jcea.es (Jesus Cea)
Date: Tue, 06 Sep 2011 07:19:20 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <4E65A955.7000507@jcea.es>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es>
Message-ID: <4E65AD58.6050106@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/09/11 07:02, Jesus Cea wrote:
> On 06/09/11 06:59, Jesus Cea wrote:
>> Thanking for helping me with this.
> 
> BTW, it is 7AM in Spain now. I am going bed. I will check this
> thread again tomorrow. Thanks for your time and effort. This is
> very frustrating, moreover because it was working very well (with
> 32GB of RAM... :-).

I just deleted all the build directories and restarted the buildbots.
Forcing a build now. Bedtime. Good night.

At this moment, I have 3 Python processes, of sizes 230, 160 and 130
MB. And growing.

Sleeping... Zzzzz...

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmWtWJlgi5GaxT1NAQJ0uAQAhmOiXf6lxZeqiRldZcYvYXxnBDw4wNKJ
ulADNvqJY7dxFPvuUZ8gv9zQcBjs+xTcY3IkDL4ZlSvubMZeR0O7mQ09zvBKXezd
PI6vIK59PPeY+Znfw29TCDB8x5As2wqLVh388eLlYyJFsuUiZfOr4KuCwRughDns
cJ7XJ4lb2+c=
=oRzC
-----END PGP SIGNATURE-----

From ncoghlan at gmail.com  Tue Sep  6 07:27:57 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 6 Sep 2011 15:27:57 +1000
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <4E65AD58.6050106@jcea.es>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es>
	<4E65AD58.6050106@jcea.es>
Message-ID: <CADiSq7d5SQ2GOTR0vYsNDOB_9Y2DNW4P69LFZcaG0Tkf74U0AQ@mail.gmail.com>

On Tue, Sep 6, 2011 at 3:19 PM, Jesus Cea <jcea at jcea.es> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 06/09/11 07:02, Jesus Cea wrote:
>> On 06/09/11 06:59, Jesus Cea wrote:
>>> Thanking for helping me with this.
>>
>> BTW, it is 7AM in Spain now. I am going bed. I will check this
>> thread again tomorrow. Thanks for your time and effort. This is
>> very frustrating, moreover because it was working very well (with
>> 32GB of RAM... :-).
>
> I just deleted all the build directories and restarted the buildbots.
> Forcing a build now. Bedtime. Good night.
>
> At this moment, I have 3 Python processes, of sizes 230, 160 and 130
> MB. And growing.

The memory usage per process seems reasonable to me, based on what I
see on my own machine.

That means it's the 15 processes that's problematic. It will be
interesting to see how these current test runs go.

It may be the case that with the reduced memory limit, your machine
may not be able to run concurrent slaves for 2.7, 3.2 and 3.x as I
believe it does now.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From solipsis at pitrou.net  Tue Sep  6 07:50:19 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 6 Sep 2011 07:50:19 +0200
Subject: [Python-Dev] cpython: Issue #12567: Add curses.unget_wch()
	function
References: <E1R0iyq-0003HY-5e@dinsdale.python.org>
Message-ID: <20110906075019.2d16f1b1@pitrou.net>

On Tue, 06 Sep 2011 01:53:32 +0200
victor.stinner <python-checkins at python.org> wrote:
> http://hg.python.org/cpython/rev/b1e03d10391e
> changeset:   72297:b1e03d10391e
> user:        Victor Stinner <victor.stinner at haypocalc.com>
> date:        Tue Sep 06 01:53:03 2011 +0200
> summary:
>   Issue #12567: Add curses.unget_wch() function
> 
> Push a character so the next get_wch() will return it.

Looks like you broke many buildbots.

Regards

Antoine.


From solipsis at pitrou.net  Tue Sep  6 09:33:59 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 6 Sep 2011 07:33:59 +0000 (UTC)
Subject: [Python-Dev] Maintenance burden of str.swapcase
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
Message-ID: <loom.20110906T093206-388@post.gmane.org>

Michael Foord <fuzzyman <at> voidspace.org.uk> writes:
> 
> Earlier this year I was at a pypy sprint helping to work on Python 2.7
compatibility. The bytearray type has much of the string interface, including
swapcase? So there was effort to implement this method with the correct
semantics for pypy. Doubtless the same has been true for IronPython, and will
also be true for Jython.

While I haven't used swapcase() a single time, I doubt there is much difficult
in implementing pure ASCII semantics, is there?

Regards

Antoine.


From victor.stinner at haypocalc.com  Tue Sep  6 10:10:31 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 06 Sep 2011 10:10:31 +0200
Subject: [Python-Dev] cpython: Issue #12567: Add
	curses.unget_wch()	function
In-Reply-To: <20110906075019.2d16f1b1@pitrou.net>
References: <E1R0iyq-0003HY-5e@dinsdale.python.org>
	<20110906075019.2d16f1b1@pitrou.net>
Message-ID: <4E65D577.4060707@haypocalc.com>

Le 06/09/2011 07:50, Antoine Pitrou a ?crit :
> On Tue, 06 Sep 2011 01:53:32 +0200
> victor.stinner<python-checkins at python.org>  wrote:
>> http://hg.python.org/cpython/rev/b1e03d10391e
>> changeset:   72297:b1e03d10391e
>> user:        Victor Stinner<victor.stinner at haypocalc.com>
>> date:        Tue Sep 06 01:53:03 2011 +0200
>> summary:
>>    Issue #12567: Add curses.unget_wch() function
>>
>> Push a character so the next get_wch() will return it.
>
> Looks like you broke many buildbots.

Oh, thanks to notify me. I expected failures, but I also forgot the skip 
the test if the function is missing.

I wrote an huge patch for this module to improve Unicode support, but I 
chose to split it into smaller patches. Because a single function broke 
most buildbots, it was a good idea :-)

Victor

From victor.stinner at haypocalc.com  Tue Sep  6 10:04:57 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 06 Sep 2011 10:04:57 +0200
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix
 PyUnicode_AsWideCharString() doc: size doesn't contain the null character
In-Reply-To: <CADiSq7eqm2CXKVeZpwo5+X3vm07+a+xGuQE-65nvL2+8t8gaDA@mail.gmail.com>
References: <E1R0j70-0003rc-8g@dinsdale.python.org>
	<CADiSq7eqm2CXKVeZpwo5+X3vm07+a+xGuQE-65nvL2+8t8gaDA@mail.gmail.com>
Message-ID: <4E65D429.3040908@haypocalc.com>

Le 06/09/2011 02:25, Nick Coghlan a ?crit :
> On Tue, Sep 6, 2011 at 10:01 AM, victor.stinner
> <python-checkins at python.org>  wrote:
>> Fix also spelling of the null character.
>
> While these cases are legitimately changed to 'null' (since they're
> lowercase descriptions of the character), I figure it's worth
> mentioning again that the ASCII name for '\0' actually *is* NUL (i.e.
> only one 'L'). Strange, but true [1].
>
> Cheers,
> Nick.
>
> [1] https://secure.wikimedia.org/wikipedia/en/wiki/ASCII

"NUL" is an abbreviation used in tables when you don't have enough space 
to write the full name: "null character".

Where do you want to mention this abbreviation?

Victor

From ncoghlan at gmail.com  Tue Sep  6 11:16:38 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 6 Sep 2011 19:16:38 +1000
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix
 PyUnicode_AsWideCharString() doc: size doesn't contain the null character
In-Reply-To: <4E65D429.3040908@haypocalc.com>
References: <E1R0j70-0003rc-8g@dinsdale.python.org>
	<CADiSq7eqm2CXKVeZpwo5+X3vm07+a+xGuQE-65nvL2+8t8gaDA@mail.gmail.com>
	<4E65D429.3040908@haypocalc.com>
Message-ID: <CADiSq7ef_wkjvRfznUfDm=p++saF2SdUUe+5HHX02P1GNfvLGw@mail.gmail.com>

On Tue, Sep 6, 2011 at 6:04 PM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> "NUL" is an abbreviation used in tables when you don't have enough space to
> write the full name: "null character".

Yep, fair description.

> Where do you want to mention this abbreviation?

Sorry, I meant worth mentioning on the list, not anywhere particular
in the docs  - the topic came up recently when an instance of NUL was
incorrectly changed to read 'NULL' instead and it took me a moment to
figure out why the same reasoning *didn't* apply in this case.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From martin at v.loewis.de  Tue Sep  6 15:03:32 2011
From: martin at v.loewis.de (martin at v.loewis.de)
Date: Tue, 06 Sep 2011 15:03:32 +0200
Subject: [Python-Dev] bigmemtests for really big memory too slow
Message-ID: <20110906150332.Horde.boB6BaGZi1VOZhok0Q6zPZA@webmail.df.eu>

I benchmarked some of the bigmemtests when run with -M 80G. They run really
slow, because they try to use all available memory, and then take a lot of
time processing it. Here are some runtimes:

test_capitalize (test.test_bigmem.StrTest) ... ok (420.490846s)
test_center (test.test_bigmem.StrTest) ... ok (149.431523s)
test_compare (test.test_bigmem.StrTest) ... ok (200.181986s)
test_concat (test.test_bigmem.StrTest) ... ok (154.282903s)
test_contains (test.test_bigmem.StrTest) ... ok (173.960073s)
test_count (test.test_bigmem.StrTest) ... ok (186.799731s)
test_encode (test.test_bigmem.StrTest) ... ok (53.752823s)
test_encode_ascii (test.test_bigmem.StrTest) ... ok (8.421414s)
test_encode_raw_unicode_escape (test.test_bigmem.StrTest) ... ok (3.752774s)
test_encode_utf32 (test.test_bigmem.StrTest) ... ok (9.732829s)
test_encode_utf7 (test.test_bigmem.StrTest) ... ok (4.998805s)
test_endswith (test.test_bigmem.StrTest) ... ok (208.022452s)
test_expandtabs (test.test_bigmem.StrTest) ... ok (614.490436s)
test_find (test.test_bigmem.StrTest) ... ok (230.722848s)
test_format (test.test_bigmem.StrTest) ... ok (407.471929s)
test_hash (test.test_bigmem.StrTest) ... ok (325.906271s)

In the test suite, we have the bigmemtest and precisionbigmemtest
decorators. I think bigmemtest cases should all be changed to
precisionbigmemtest, giving sizes of just above 2**31. With that
change, the runtime for test_capitalize would go down to 42s.

What do you think?

Regards,
Martin


From solipsis at pitrou.net  Tue Sep  6 15:27:38 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 6 Sep 2011 15:27:38 +0200
Subject: [Python-Dev] bigmemtests for really big memory too slow
References: <20110906150332.Horde.boB6BaGZi1VOZhok0Q6zPZA@webmail.df.eu>
Message-ID: <20110906152738.733f98cd@pitrou.net>


Hello Martin,

> In the test suite, we have the bigmemtest and precisionbigmemtest
> decorators. I think bigmemtest cases should all be changed to
> precisionbigmemtest, giving sizes of just above 2**31. With that
> change, the runtime for test_capitalize would go down to 42s.

I have started working on this and other things in
http://hg.python.org/sandbox/antoine/, branch "bigmem".

I was planning to propose the same thing, which indeed makes tests pass
much more quickly, but I was waiting to try and solve some other
crashes in test_bigmem.

Regards

Antoine.


From jsbueno at python.org.br  Tue Sep  6 16:32:13 2011
From: jsbueno at python.org.br (Joao S. O. Bueno)
Date: Tue, 6 Sep 2011 11:32:13 -0300
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
Message-ID: <CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>

On Mon, Sep 5, 2011 at 8:56 AM, Michael Foord <fuzzyman at voidspace.org.uk> wrote:
> Hey all,
> A while ago there was a discussion of the value of apis like str.swapcase,
> and it was suggested that even though it was acknowledged to be useless the
> effort of deprecating and removing it was thought to be more than the value
> in removing it.
> Earlier this year I was at a pypy sprint helping to work on Python 2.7
> compatibility. The bytearray type has much of the string interface,
> including swapcase? So there was effort to implement this method with the
> correct semantics for pypy. Doubtless the same has been true for IronPython,
> and will also be true for Jython.
> Whilst it is too late for Python 2.x, it *is* (in my opinion) worth removing
> unused and unneeded APIs. Even if the effort to remove them is more than any
> effort saved on the part of users it helps other implementations down the
> road that no longer need to provide these APIs.
> All the best,
> Michael Foord
>

On the other hand,
for any users wanting to use this i n the future, if it is not there,
they'd have to implement the logic for themselves. If it is a "burden"
for someone in a sprint, looking at other implementations, and with
all the unicode knowledge/documentation around, it would be pretty
much undoable in the correct way by a casual user. Removing it would
mean explicitly "batteries removal".

If you get some traction o n that, at least consider moving it to  a
pure python function on the string module.


  js
 -><-

> --
> http://www.voidspace.org.uk/
>
> May you do good and not evil
> May you find forgiveness for yourself and forgive others
> May you share freely, never taking more than you give.
> -- the sqlite blessing http://www.sqlite.org/different.html
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br
>
>

From solipsis at pitrou.net  Tue Sep  6 16:46:37 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 6 Sep 2011 16:46:37 +0200
Subject: [Python-Dev] bigmemtests for really big memory too slow
References: <20110906150332.Horde.boB6BaGZi1VOZhok0Q6zPZA@webmail.df.eu>
Message-ID: <20110906164637.0aaa5e10@pitrou.net>


For the record, I've disabled automatic builds on the bigmem buildbot
until things get sorted out a bit (no need to eat huge amounts of RAM
and eight hours of CPU each time a commit is pushed, only to have the
process killed :-)). It's still possible to run custom builds, of
course.

Regards

Antoine.


On Tue, 06 Sep 2011 15:03:32 +0200
martin at v.loewis.de wrote:
> I benchmarked some of the bigmemtests when run with -M 80G. They run really
> slow, because they try to use all available memory, and then take a lot of
> time processing it. Here are some runtimes:
> 
> test_capitalize (test.test_bigmem.StrTest) ... ok (420.490846s)
> test_center (test.test_bigmem.StrTest) ... ok (149.431523s)
> test_compare (test.test_bigmem.StrTest) ... ok (200.181986s)
> test_concat (test.test_bigmem.StrTest) ... ok (154.282903s)
> test_contains (test.test_bigmem.StrTest) ... ok (173.960073s)
> test_count (test.test_bigmem.StrTest) ... ok (186.799731s)
> test_encode (test.test_bigmem.StrTest) ... ok (53.752823s)
> test_encode_ascii (test.test_bigmem.StrTest) ... ok (8.421414s)
> test_encode_raw_unicode_escape (test.test_bigmem.StrTest) ... ok (3.752774s)
> test_encode_utf32 (test.test_bigmem.StrTest) ... ok (9.732829s)
> test_encode_utf7 (test.test_bigmem.StrTest) ... ok (4.998805s)
> test_endswith (test.test_bigmem.StrTest) ... ok (208.022452s)
> test_expandtabs (test.test_bigmem.StrTest) ... ok (614.490436s)
> test_find (test.test_bigmem.StrTest) ... ok (230.722848s)
> test_format (test.test_bigmem.StrTest) ... ok (407.471929s)
> test_hash (test.test_bigmem.StrTest) ... ok (325.906271s)
> 
> In the test suite, we have the bigmemtest and precisionbigmemtest
> decorators. I think bigmemtest cases should all be changed to
> precisionbigmemtest, giving sizes of just above 2**31. With that
> change, the runtime for test_capitalize would go down to 42s.
> 
> What do you think?
> 
> Regards,
> Martin
> 
> 
> 


From tseaver at palladion.com  Tue Sep  6 17:11:51 2011
From: tseaver at palladion.com (Tres Seaver)
Date: Tue, 06 Sep 2011 11:11:51 -0400
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix
 PyUnicode_AsWideCharString() doc: size doesn't contain the null character
In-Reply-To: <4E65D429.3040908@haypocalc.com>
References: <E1R0j70-0003rc-8g@dinsdale.python.org>
	<CADiSq7eqm2CXKVeZpwo5+X3vm07+a+xGuQE-65nvL2+8t8gaDA@mail.gmail.com>
	<4E65D429.3040908@haypocalc.com>
Message-ID: <j45d7j$8ti$1@dough.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/06/2011 04:04 AM, Victor Stinner wrote:
> Le 06/09/2011 02:25, Nick Coghlan a ?crit :
>> On Tue, Sep 6, 2011 at 10:01 AM, victor.stinner 
>> <python-checkins at python.org>  wrote:
>>> Fix also spelling of the null character.
>> 
>> While these cases are legitimately changed to 'null' (since
>> they're lowercase descriptions of the character), I figure it's
>> worth mentioning again that the ASCII name for '\0' actually *is*
>> NUL (i.e. only one 'L'). Strange, but true [1].
>> 
>> Cheers, Nick.
>> 
>> [1] https://secure.wikimedia.org/wikipedia/en/wiki/ASCII
> 
> "NUL" is an abbreviation used in tables when you don't have enough
> space to write the full name: "null character".
> 
> Where do you want to mention this abbreviation?

FWIW, the RFC 20 (the ASCII spec) really really defines 'NUL'  as the
*name* of the \0 character, not just an "abbreviation used in tables":

 http://tools.ietf.org/html/rfc20#section-5.2


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5mODcACgkQ+gerLs4ltQ7VwACgicaURzX4wAWOi+sRYGBwF5/3
8okAniSkHIlBv/VoibW6klR3WgD8T3ph
=LlKo
-----END PGP SIGNATURE-----


From merwok at netwok.org  Tue Sep  6 17:17:47 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Tue, 06 Sep 2011 17:17:47 +0200
Subject: [Python-Dev] [Python-checkins] cpython: Issue #9561: packaging
 now writes egg-info files using UTF-8
In-Reply-To: <E1R0hOF-0005Uh-07@dinsdale.python.org>
References: <E1R0hOF-0005Uh-07@dinsdale.python.org>
Message-ID: <4E66399B.4030006@netwok.org>

Le 06/09/2011 00:11, victor.stinner a ?crit :
> http://hg.python.org/cpython/rev/56ab3257ca13
> changeset:   72296:56ab3257ca13
> user:        Victor Stinner <victor.stinner at haypocalc.com>
> date:        Tue Sep 06 00:11:13 2011 +0200
> summary:
>   Issue #9561: packaging now writes egg-info files using UTF-8
> 
> instead of the locale encoding

>  
>      def _distutils_pkg_info(self):
>          tmp = self._distutils_setup_py_pkg()
> -        self.write_file([tmp, 'PKG-INFO'], '')
> +        self.write_file([tmp, 'PKG-INFO'], '', encoding='UTF-8')

This function is writing an empty string; isn?t it the same bytes in
UTF-8 or in the locale encoding?  (Are there people that use encodings
with BOMs as locale? *shudders*)

From victor.stinner at haypocalc.com  Tue Sep  6 17:50:31 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 06 Sep 2011 17:50:31 +0200
Subject: [Python-Dev] [Python-checkins] cpython: Issue #9561: packaging
 now writes egg-info files using UTF-8
In-Reply-To: <4E66399B.4030006@netwok.org>
References: <E1R0hOF-0005Uh-07@dinsdale.python.org>
	<4E66399B.4030006@netwok.org>
Message-ID: <4E664147.8010407@haypocalc.com>

Le 06/09/2011 17:17, ?ric Araujo a ?crit :
> Le 06/09/2011 00:11, victor.stinner a ?crit :
>> http://hg.python.org/cpython/rev/56ab3257ca13
>> changeset:   72296:56ab3257ca13
>> user:        Victor Stinner<victor.stinner at haypocalc.com>
>> date:        Tue Sep 06 00:11:13 2011 +0200
>> summary:
>>    Issue #9561: packaging now writes egg-info files using UTF-8
>>
>> instead of the locale encoding
>
>>
>>       def _distutils_pkg_info(self):
>>           tmp = self._distutils_setup_py_pkg()
>> -        self.write_file([tmp, 'PKG-INFO'], '')
>> +        self.write_file([tmp, 'PKG-INFO'], '', encoding='UTF-8')
>
> This function is writing an empty string; isn?t it the same bytes in
> UTF-8 or in the locale encoding?

This patch is just cosmetic: it doesn't change anything (except that 
TextIOWrapper doesn't have to change temporary the locale to get the 
locale encoding).

Victor

From stephen at xemacs.org  Tue Sep  6 18:59:56 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 07 Sep 2011 01:59:56 +0900
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
Message-ID: <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>

Joao S. O. Bueno writes:

 > Removing it would mean explicitly "batteries removal".

That's what we usually do with a dead battery, no?<wink />

From tseaver at palladion.com  Tue Sep  6 18:58:07 2011
From: tseaver at palladion.com (Tres Seaver)
Date: Tue, 06 Sep 2011 12:58:07 -0400
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <j45jer$o06$1@dough.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote:
> Joao S. O. Bueno writes:
> 
>> Removing it would mean explicitly "batteries removal".
> 
> That's what we usually do with a dead battery, no?<wink />

Normally one "replaces" dead batteries. :)


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5mUR8ACgkQ+gerLs4ltQ7Y3gCgzRdR3Vjc/i7KsC3S0OFxRi1I
r3sAoMzmSxot9+k5EnatZ8RYvFnhPO5B
=PNN1
-----END PGP SIGNATURE-----


From tjreedy at udel.edu  Tue Sep  6 19:55:24 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 06 Sep 2011 13:55:24 -0400
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix
 PyUnicode_AsWideCharString() doc: size doesn't contain the null character
In-Reply-To: <j45d7j$8ti$1@dough.gmane.org>
References: <E1R0j70-0003rc-8g@dinsdale.python.org>
	<CADiSq7eqm2CXKVeZpwo5+X3vm07+a+xGuQE-65nvL2+8t8gaDA@mail.gmail.com>
	<4E65D429.3040908@haypocalc.com> <j45d7j$8ti$1@dough.gmane.org>
Message-ID: <j45ms1$iuv$1@dough.gmane.org>

On 9/6/2011 11:11 AM, Tres Seaver wrote:

> FWIW, the RFC 20 (the ASCII spec) really really defines 'NUL'  as the
> *name* of the \0 character, not just an "abbreviation used in tables":
>
>   http://tools.ietf.org/html/rfc20#section-5.2

As I read the text, the 2 or 3 capital letter *symbols* are 
abbreviations of of the names. Looking back up, I see
'''
4. Legend
4.1 Control Characters
    NUL Null                                DLE Data Link Escape (CC)
...
4.2 Graphic Characters
    Column/Row  Symbol      Name
    2/0         SP          Space (Normally Non-Printing)
    2/1         !           Exclamation Point
'''
'NUL' and 'SP' are *symbols* that have the names 'Null' and 'Space', 
just as the symbol '!' is named 'Exclamation Point'. They just happen to 
be digraphs and trigraphs composed of 2 or 3 characters.

I am sure that the symbol SP does not appear in the docs. The symbol 
'LF' (for LineFeed) probably does not either. We just call it 'newline' 
or 'newline character' as that is how we use it.

-- 
Terry Jan Reedy


From tjreedy at udel.edu  Tue Sep  6 21:05:16 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 06 Sep 2011 15:05:16 -0400
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <j45jer$o06$1@dough.gmane.org>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org>
Message-ID: <j45qv0$il8$1@dough.gmane.org>

On 9/6/2011 12:58 PM, Tres Seaver wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote:
>> Joao S. O. Bueno writes:
>>
>>> Removing it would mean explicitly "batteries removal".
>>
>> That's what we usually do with a dead battery, no?<wink />
>
> Normally one "replaces" dead batteries. :)

Not if it is dead and leaking because the device has been unused for years.

https://www.google.com/codesearch#search/&q=lang:^python$%20swapcase%20case:yes&type=cs

returns a mere 300 hits. At least half are definitions of the function, 
or tests thereof, or inclusions in lists. Some actual uses:

1.http://pytof.googlecode.com/svn/trunk/pytof/utils.py
def ListCurrentDirFileFromExt(ext, path):
     """ list file matching extension from a list
     in the current directory
     emulate a `ls *.{(',').join(ext)` with ext in both upper and 
downcase}"""
     import glob
     extfiles = []
     for e in ext:
         extfiles.extend(glob.glob(join(path,'*' + e)))
         extfiles.extend(glob.glob(join(path,'*' + e.swapcase())))

If e is all upper or lower, using e.upper() and e.lower() will do same. 
If e is mixed, using .upper and .lower is required to fulfill the spec. 
On *nix, where matching of letters is case sensitive, both will fail 
with '.Jpg'. On Windows, where letter matching ignores case, the above 
code will list everything twice.

2.http://ydict.googlecode.com/svn/trunk/ydict
k is random word from database.

result.replace(k, "####").replace(k.upper(), 
"####").replace(k[0].swapcase()+k[1:].lower(),"####")

If k is lowercase, .lower() is redundant and 
k[0].swapcase()+k[1:].lower() == k.title(). If k is uppercase, previous 
.upper() is redundant. If k is mixed case, code may have problems.

3. http://migrid.googlecode.com/svn/trunk/mig/sftp-mount/migaccess.py

#    This is how we could add stub extended attribute handlers...
#    (We can't have ones which aptly delegate requests to the underlying fs
#    because Python lacks a standard xattr interface.)
#
#    def getxattr(self, path, name, size):
#        val = name.swapcase() + '@' + path
#        if size == 0:
#            # We are asked for size of the value.
#            return len(val)
#        return val

This is not actually used. Passing a name with all cases swapped from 
what they should be is a bit strange.

4.
                 elif char >= 'A' and  char <= 'Z':
                     element = element + char.swapcase()

uppercasechar.swapcase() == uppercasechar.lower()


My perusal of the first 70 of 300 hits suggests that .swapcase is more 
of an attractive nuisance or redundant rather than actually useful.

-- 
Terry Jan Reedy


From steve at pearwood.info  Tue Sep  6 21:36:27 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 07 Sep 2011 05:36:27 +1000
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <j45qv0$il8$1@dough.gmane.org>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>	<j45jer$o06$1@dough.gmane.org>
	<j45qv0$il8$1@dough.gmane.org>
Message-ID: <4E66763B.7080707@pearwood.info>

Terry Reedy wrote:
> On 9/6/2011 12:58 PM, Tres Seaver wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote:
>>> Joao S. O. Bueno writes:
>>>
>>>> Removing it would mean explicitly "batteries removal".
>>>
>>> That's what we usually do with a dead battery, no?<wink />
>>
>> Normally one "replaces" dead batteries. :)
> 
> Not if it is dead and leaking because the device has been unused for years.


Can we please not make decisions about what code should be removed based 
on dodgy analogies? :)

Perhaps I missed something early on, but why are we proposing removing a 
function which (presumably) is stable and tested and works and is not 
broken? What maintenance is needed here?


[...]
> If k is lowercase, .lower() is redundant and 
> k[0].swapcase()+k[1:].lower() == k.title(). 

Not so.

 >>> k = 'aaaa bbbb'
 >>> k.title()
'Aaaa Bbbb'
 >>> k[0].swapcase()+k[1:].lower()
'Aaaa bbbb'


> If k is uppercase, previous 
> .upper() is redundant. If k is mixed case, code may have problems.

"May" have problems?


pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT 
EDITOR APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS.


-- 
Steven

From fuzzyman at voidspace.org.uk  Tue Sep  6 21:41:07 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Tue, 6 Sep 2011 20:41:07 +0100
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <4E66763B.7080707@pearwood.info>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>	<j45jer$o06$1@dough.gmane.org>
	<j45qv0$il8$1@dough.gmane.org> <4E66763B.7080707@pearwood.info>
Message-ID: <C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>

On 6 Sep 2011, at 20:36, Steven D'Aprano wrote:
> Terry Reedy wrote:
>> On 9/6/2011 12:58 PM, Tres Seaver wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>> 
>>> On 09/06/2011 12:59 PM, Stephen J. Turnbull wrote:
>>>> Joao S. O. Bueno writes:
>>>> 
>>>>> Removing it would mean explicitly "batteries removal".
>>>> 
>>>> That's what we usually do with a dead battery, no?<wink />
>>> 
>>> Normally one "replaces" dead batteries. :)
>> Not if it is dead and leaking because the device has been unused for years.
> 
> 
> Can we please not make decisions about what code should be removed based on dodgy analogies? :)
> 
> Perhaps I missed something early on, but why are we proposing removing a function which (presumably) is stable and tested and works and is not broken? What maintenance is needed here?


The maintenance burden is on other implementations. Even if there is no maintenance burden for CPython having useless methods simply because  it is less effort to leave them in place creates work for new implementations wanting to be fully compatible. 

> 
> 
> [...]
>> If k is lowercase, .lower() is redundant and k[0].swapcase()+k[1:].lower() == k.title(). 
> 
> Not so.
> 
> >>> k = 'aaaa bbbb'
> >>> k.title()
> 'Aaaa Bbbb'
> >>> k[0].swapcase()+k[1:].lower()
> 'Aaaa bbbb'
> 
> 
>> If k is uppercase, previous .upper() is redundant. If k is mixed case, code may have problems.
> 
> "May" have problems?
> 
> 
> pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS.


Have you ever used str.swapcase for that purpose?

Michael


--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html


From fdrake at acm.org  Tue Sep  6 21:42:03 2011
From: fdrake at acm.org (Fred Drake)
Date: Tue, 6 Sep 2011 15:42:03 -0400
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <4E66763B.7080707@pearwood.info>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org>
	<j45qv0$il8$1@dough.gmane.org> <4E66763B.7080707@pearwood.info>
Message-ID: <CAFT4OTGep88erAF_fMHpDbAxQpS+sCsMxWJ781Tdg4QkkApOmQ@mail.gmail.com>

On Tue, Sep 6, 2011 at 3:36 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR
> APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS.

There's a better solution to that, but the caps lock lobby has a stranglehold
on keyboard manufacturers.


-- 
Fred L. Drake, Jr.? ? <fdrake at acm.org>
"A person who won't read has no advantage over one who can't read."
?? --Samuel Langhorne Clemens

From barry at python.org  Tue Sep  6 22:03:50 2011
From: barry at python.org (Barry Warsaw)
Date: Tue, 6 Sep 2011 16:03:50 -0400
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <CAFT4OTGep88erAF_fMHpDbAxQpS+sCsMxWJ781Tdg4QkkApOmQ@mail.gmail.com>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<CAFT4OTGep88erAF_fMHpDbAxQpS+sCsMxWJ781Tdg4QkkApOmQ@mail.gmail.com>
Message-ID: <20110906160350.26237e64@resist.wooz.org>

On Sep 06, 2011, at 03:42 PM, Fred Drake wrote:

>On Tue, Sep 6, 2011 at 3:36 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> pERSONNALLY, i THINK THAT A SWAPCASE COMMAND IS ESSENTIAL FOR TEXT EDITOR
>> APPLICATIONS, TO AVOID THOSE LITTLE cAPS lOCK ACCIDENTS.
>
>There's a better solution to that, but the caps lock lobby has a stranglehold
>on keyboard manufacturers.

Fight The Man with xmodmap!

-Barry

From martin at v.loewis.de  Tue Sep  6 22:18:49 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 06 Sep 2011 22:18:49 +0200
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>	<j45jer$o06$1@dough.gmane.org>	<j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
Message-ID: <4E668029.6080106@v.loewis.de>

>> Perhaps I missed something early on, but why are we proposing
>> removing a function which (presumably) is stable and tested and
>> works and is not broken? What maintenance is needed here?
> 
> 
> The maintenance burden is on other implementations.

It's not a maintenance burden (at least not in the sense in which
I understand the word "maintenance" - as an ongoing effort). When
they implement it once, the implementation can likely stay forever,
unmodified.

> Even if there is
> no maintenance burden for CPython having useless methods simply
> because  it is less effort to leave them in place creates work for
> new implementations wanting to be fully compatible.

That's true.

However, that alone is not enough reason to remove the feature, IMO.
The effort that is saved is not only on the developers of CPython,
but also on users of the feature. My claim is that for any little-used
feature, removing it costs more time world-wide than re-implementing
it in 10 alternative Python implementations (with the number 10 drawn
out of blue air), because of the cost of changing the applications that
actually do use the feature.

With the switch to Python 3, there would have been a chance to remove
little-used features. IMO, the next such chance is with Python 4.
It could be useful to start collecting little-used features that might
be removed with Python 4 - which I don't expect until 2020.

Regards,
Martin

From fuzzyman at voidspace.org.uk  Tue Sep  6 22:23:26 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Tue, 6 Sep 2011 21:23:26 +0100
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <4E668029.6080106@v.loewis.de>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>	<j45jer$o06$1@dough.gmane.org>	<j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
Message-ID: <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>

On 6 Sep 2011, at 21:18, Martin v. L?wis wrote:
>>> Perhaps I missed something early on, but why are we proposing
>>> removing a function which (presumably) is stable and tested and
>>> works and is not broken? What maintenance is needed here?
>> 
>> 
>> The maintenance burden is on other implementations.
> 
> It's not a maintenance burden (at least not in the sense in which
> I understand the word "maintenance" - as an ongoing effort). When
> they implement it once, the implementation can likely stay forever,
> unmodified.

Ok, burden rather than "maintenance" burden.

> 
>> Even if there is
>> no maintenance burden for CPython having useless methods simply
>> because  it is less effort to leave them in place creates work for
>> new implementations wanting to be fully compatible.
> 
> That's true.
> 
> However, that alone is not enough reason to remove the feature, IMO.
> The effort that is saved is not only on the developers of CPython,
> but also on users of the feature. My claim is that for any little-used
> feature, removing it costs more time world-wide than re-implementing
> it in 10 alternative Python implementations (with the number 10 drawn
> out of blue air), because of the cost of changing the applications that
> actually do use the feature.
> 

Which applications? I'm not sure the number of applications using str.swapcase gets even as high as ten.

> With the switch to Python 3, there would have been a chance to remove
> little-used features. IMO, the next such chance is with Python 4.
> It could be useful to start collecting little-used features that might
> be removed with Python 4 - which I don't expect until 2020.

We still have our standard deprecation policy that we can follow in Python 3. We don't have to wait until Python 4 to remove things. Changing semantics or syntax is harder because you can't really deprecate. Just removing methods is straightforward.

MIchael

> 
> Regards,
> Martin
> 


--
http://www.voidspace.org.uk/


May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing 
http://www.sqlite.org/different.html


From martin at v.loewis.de  Tue Sep  6 22:36:47 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 06 Sep 2011 22:36:47 +0200
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>	<j45jer$o06$1@dough.gmane.org>	<j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
Message-ID: <4E66845F.3060708@v.loewis.de>

> Which applications? I'm not sure the number of applications using
> str.swapcase gets even as high as ten.

I think this is what people underestimate. I can't name
applications either - but that doesn't mean they don't exist.
I'm deeply convinced that the majority of Python code (and
I mean *large* majority) is unpublished.

I expect thousands of uses world-wide.

> We still have our standard deprecation policy that we can follow in
> Python 3. We don't have to wait until Python 4 to remove things.

That's true. However, part of the deprecation procedure is also that
there should be a rationale for removing it. In the past, things have
been removed that had been superseded with something new, or things
that had been flawed in their design so that fixing it wasn't really
possible, or that did indeed cause ongoing maintenance effort for
a minority of users (such as the support for little-used platforms).

None if these motivations hold for str.swapcase, and I think the
"other implementations will have to implement it" is not sufficient
motivation. If the other implementations believe that the feature
is truly useless and also not used, they just can declare it a
deliberate deviation from CPython, and refuse to implement it.

If I had to pick a truly useless feature, I'd kill complex numbers,
not str.swapcase.

Regards,
Martin


From raymond.hettinger at gmail.com  Tue Sep  6 23:23:37 2011
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Tue, 6 Sep 2011 14:23:37 -0700
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <4E66845F.3060708@v.loewis.de>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>	<j45jer$o06$1@dough.gmane.org>	<j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
Message-ID: <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>


On Sep 6, 2011, at 1:36 PM, Martin v. L?wis wrote:

> I think this is what people underestimate. I can't name
> applications either - but that doesn't mean they don't exist.

Google code search is pretty good indicator that this method
has near zero uptake.   If it dies, I don't think anyone will cry.


Raymond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110906/235f6fce/attachment.html>

From greg.ewing at canterbury.ac.nz  Wed Sep  7 02:24:18 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 07 Sep 2011 12:24:18 +1200
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix
 PyUnicode_AsWideCharString() doc: size doesn't contain the null character
In-Reply-To: <4E65D429.3040908@haypocalc.com>
References: <E1R0j70-0003rc-8g@dinsdale.python.org>
	<CADiSq7eqm2CXKVeZpwo5+X3vm07+a+xGuQE-65nvL2+8t8gaDA@mail.gmail.com>
	<4E65D429.3040908@haypocalc.com>
Message-ID: <4E66B9B2.2070508@canterbury.ac.nz>

Victor Stinner wrote:

> "NUL" is an abbreviation used in tables when you don't have enough space 
> to write the full name: "null character".

It's also the official name of the character, for when you want
to be unambiguous about what you mean (e.g. "null character" as
opposed to "empty string" or "null pointer").

I expect it's 3 chars for consistency with all the other control
character names.

-- 
Greg

From ncoghlan at gmail.com  Wed Sep  7 02:47:16 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 7 Sep 2011 10:47:16 +1000
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
Message-ID: <CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>

On Wed, Sep 7, 2011 at 7:23 AM, Raymond Hettinger
<raymond.hettinger at gmail.com> wrote:
>
> On Sep 6, 2011, at 1:36 PM, Martin v. L?wis wrote:
>
> I think this is what people underestimate. I can't name
> applications either - but that doesn't mean they don't exist.
>
> Google code search is pretty good indicator that this method
> has near zero uptake. ? If it dies, I don't think anyone will cry.

For str itself, I'm -0 on removing it - the Unicode implications mean
implementation isn't completely trivial and there's at least one
legitimate use case (i.e. providing, or deliberately reversing, Caps
Lock style functionality).

However, a big +1 for deprecation in the case of bytes and bytearray.
That's nothing to do with the maintenance burden though, it's to do
with the semantic confusion between binary data and ASCII-encoded text
implied by the retention of methods like upper(), lower() and
swapcase().

Specifically, the methods I consider particularly problematic on that front are:
 'capitalize'
 'islower'
 'istitle'
 'isupper'
 'lower'
 'swapcase'
 'title'
 'upper'

These are all text operations, not something you do with binary data.

There are some other methods that make ASCII specific default
assumptions regarding whitespace and line separators, but ASCII
whitespace is often used as a delimiter in wire protocols so losing
those would be genuinely annoying. I've also left out the methods for
identifying ASCII letters and digits, since again, those are useful
for interpreting various wire encodings. The case-related methods,
though, have no place in sane wire protocol handling.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From steve at pearwood.info  Wed Sep  7 03:02:05 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 07 Sep 2011 11:02:05 +1000
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>	<j45jer$o06$1@dough.gmane.org>	<j45qv0$il8$1@dough.gmane.org>	<4E66763B.7080707@pearwood.info>	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>	<4E668029.6080106@v.loewis.de>	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
Message-ID: <4E66C28D.2010703@pearwood.info>

Raymond Hettinger wrote:
> On Sep 6, 2011, at 1:36 PM, Martin v. L?wis wrote:
> 
>> I think this is what people underestimate. I can't name
>> applications either - but that doesn't mean they don't exist.
> 
> Google code search is pretty good indicator that this method
> has near zero uptake.   If it dies, I don't think anyone will cry.

Near-zero is not zero, and Terry has already shown some examples of code 
which use, or misuse, swapcase.

In any case (pun intended *wink*) this was discussed in December and 
Guido expressed little enthusiasm for the idea:

http://mail.python.org/pipermail/python-dev/2010-December/106650.html

I can't exactly defend the existence of swapcase, it does seem to be a 
fairly specialised function. But given that it exists, I'm -0.5 on 
removal on the basis of "if it ain't broke, don't fix it".


-- 
Steven


From solipsis at pitrou.net  Wed Sep  7 03:07:58 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 7 Sep 2011 03:07:58 +0200
Subject: [Python-Dev] Maintenance burden of str.swapcase
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
	<CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
Message-ID: <20110907030758.58caa4ed@pitrou.net>

On Wed, 7 Sep 2011 10:47:16 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> However, a big +1 for deprecation in the case of bytes and bytearray.
> That's nothing to do with the maintenance burden though, it's to do
> with the semantic confusion between binary data and ASCII-encoded text
> implied by the retention of methods like upper(), lower() and
> swapcase().

A big -1 on that.
Bytes objects are often used for partly ASCII strings, not arbitrary
"arrays of bytes". And making indexing of bytes objects return ints was
IMHO a mistake.

Besides, if you want an array of ints, there's already array.array()
with your typecode of choice. Not sure why other types should conform.

Regards

Antoine.


From stephen at xemacs.org  Wed Sep  7 03:53:27 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 07 Sep 2011 10:53:27 +0900
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
	<CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
Message-ID: <877h5li0dk.fsf@uwakimon.sk.tsukuba.ac.jp>

Nick Coghlan writes:

 > However, a big +1 for deprecation in the case of bytes and bytearray.
 > That's nothing to do with the maintenance burden though, it's to do
 > with the semantic confusion between binary data and ASCII-encoded text
 > implied by the retention of methods like upper(), lower() and
 > swapcase().

[...]

 > These are all text operations, not something you do with binary data.

"Yea, Brother, Amen!"  I like the taste of this Kool-Aid.  But....

 > The case-related methods, though, have no place in sane wire
 > protocol handling.

RFC 822 headers are a somewhat insane but venerable (isn't that true
of anything that's reached age 350 in dog-years?), and venerated,
counterexample.  Specifically, field names are case-insensitive (RFC
5322, section 1.2.2).  I'll bet you can find plenty of others if you
look.  You can call that "text" and say it should be processed in
Unicode, if you like, but you're not even going to convince me (and as
I say, I like the Kool-Aid).  Specifically, SMTP processes can (and
even MUST, under some circumstances IIRC) manipulate the RFC 822 header.

Sorry, Nick, no can do.

-1

From stephen at xemacs.org  Wed Sep  7 04:15:04 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 07 Sep 2011 11:15:04 +0900
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <20110907030758.58caa4ed@pitrou.net>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
	<CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
	<20110907030758.58caa4ed@pitrou.net>
Message-ID: <8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp>

Antoine Pitrou writes:

 > Bytes objects are often used for partly ASCII strings,

All I can say to that phrase is, "urk, ISO 2022 anyone?"

 > not arbitrary "arrays of bytes". And making indexing of bytes
 > objects return ints was IMHO a mistake.

Bytes objects are not ASCII strings, even though they can be used to
represent them.  The practice of using magic numbers that look like
English words is a useful one, but by the same token, it should not be
too easy to use bytes to represent *text* just because the programmer
doesn't know any words that don't fit into 7*N bits.  With PEP 393,
there isn't even really a space excuse.

AFAICS, anything that should be done with ASCII-punned magic numbers
("protocol tokens", if you prefer) can be done with slices and (ta-da!)
case conversion.  (Sorry, Nick!)  But the components of a bytes object
are just numbers; they are not characters until you've run them
through a codec.

From ncoghlan at gmail.com  Wed Sep  7 05:00:16 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 7 Sep 2011 13:00:16 +1000
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <877h5li0dk.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
	<CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
	<877h5li0dk.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7eAUXpJaarNC+pQjDMB9aAKNn93UVpXVfwURRwFM=mvOA@mail.gmail.com>

On Wed, Sep 7, 2011 at 11:53 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Nick Coghlan writes:
> ?> The case-related methods, though, have no place in sane wire
> ?> protocol handling.
>
> RFC 822 headers are a somewhat insane but venerable (isn't that true
> of anything that's reached age 350 in dog-years?), and venerated,
> counterexample. ?Specifically, field names are case-insensitive (RFC
> 5322, section 1.2.2). ?I'll bet you can find plenty of others if you
> look. ?You can call that "text" and say it should be processed in
> Unicode, if you like, but you're not even going to convince me (and as
> I say, I like the Kool-Aid). ?Specifically, SMTP processes can (and
> even MUST, under some circumstances IIRC) manipulate the RFC 822 header.
>
> Sorry, Nick, no can do.
>
> -1

Heh, I knew as soon as I sent that message that someone would be able
to point out a counter example. I agree that RFC 822 (and
case-insensitive ASCII comparison in general) is enough to save
lower() and upper() and co, but what about this even further reduced
list of text-specific methods:

 'capitalize'
 'istitle'
 'swapcase'
 'title'

While case-insensitive comparison makes sense for wire level data,
where do these methods fit in, even when embedded ASCII text fragments
are involved?

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From stephen at xemacs.org  Wed Sep  7 06:36:26 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 07 Sep 2011 13:36:26 +0900
Subject: [Python-Dev] Deprecating bytes.swapcase and friends [was:
	Maintenance burden of str.swapcase]
In-Reply-To: <CADiSq7eAUXpJaarNC+pQjDMB9aAKNn93UVpXVfwURRwFM=mvOA@mail.gmail.com>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
	<CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
	<877h5li0dk.fsf@uwakimon.sk.tsukuba.ac.jp>
	<CADiSq7eAUXpJaarNC+pQjDMB9aAKNn93UVpXVfwURRwFM=mvOA@mail.gmail.com>
Message-ID: <874o0phstx.fsf@uwakimon.sk.tsukuba.ac.jp>

This is all speculation and no hint of implementation at this point ...
redirecting this subthread to Python-Ideas.  Reply-To set accordingly.

Nick Coghlan writes:

 > Heh, I knew as soon as I sent that message that someone would be able
 > to point out a counter example. I agree that RFC 822 (and
 > case-insensitive ASCII comparison in general) is enough to save
 > lower() and upper() and co, but what about this even further reduced
 > list of text-specific methods:
 > 
 >  'capitalize'
 >  'istitle'
 >  'swapcase'
 >  'title'
 > 
 > While case-insensitive comparison makes sense for wire level data,
 > where do these methods fit in, even when embedded ASCII text fragments
 > are involved?

Well, 'capitalize' could theoretically be used to "beautify" RFC 822
field names, but realistically, to me they're a litmus test for
packages I probably don't want on my system.<0.5 wink>

I don't know if it's worth the effort to deprecate them, though.
There is a school of thought (represented on python-dev by Philip Eby
and Antoine Pitrou, among others, I would say) that says that text
with an implicit encoding is still text if you can figure out what the
encoding is, and the syntactically important tokens are invariably
ASCII, which often is enough information to do the work.  So if you
can do some operation without first converting to str, let's save the
cycles and the bytes (especially in bit-shoveling applications like
WSGI)!  I disagree, but "consenting adults" and all that.

It occurs to me that the bit-shoveling applications would generally be
sufficiently well-served with a special "codec" that just stuffs the
data pointer in a bytes object into the latin1 member of the data
pointer union in a PEP 393 Unicode object, and marks the Unicode
object as "ascii-compatible", ie, anything ASCII can be manipulated as
text, but anything non-ASCII is like a private character that Python
doesn't know anything about, and can't do anything useful with, except
delete or pass through verbatim (perhaps as a slice).

This may be nonsense; I don't know enough about Python internals to be
sure.  And it would be a change to PEP 393, since the encoding of the
8-bit representation would no longer be Unicode.  I wouldn't blame
Martin one bit if he hated the idea in principle!  On the other hand,
the "Latin-1 can be used to decode any binary content" end-around
makes that point moot IMO.  This would give a somewhat safer way of
doing that.

But if feasible and a Pythonic implementation could be devised, that
would take much of the wind out of the sails of the "implicitly it's
ASCII text" crowd.  The whole "it's inefficient in time and space to
work with 'str'" argument goes away, leaving them with "it's verbose"
as the only reason for not doing the conversion.

I don't know if there would be any use case left for bytes at that
point ... but that's clearly a py4k discussion.

From jcea at jcea.es  Wed Sep  7 13:38:23 2011
From: jcea at jcea.es (Jesus Cea)
Date: Wed, 07 Sep 2011 13:38:23 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <CADiSq7d5SQ2GOTR0vYsNDOB_9Y2DNW4P69LFZcaG0Tkf74U0AQ@mail.gmail.com>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es>
	<4E65AD58.6050106@jcea.es>
	<CADiSq7d5SQ2GOTR0vYsNDOB_9Y2DNW4P69LFZcaG0Tkf74U0AQ@mail.gmail.com>
Message-ID: <4E6757AF.4050007@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/09/11 07:27, Nick Coghlan wrote:
> It may be the case that with the reduced memory limit, your
> machine may not be able to run concurrent slaves for 2.7, 3.2 and
> 3.x as I believe it does now.

Antoine has changed the buildmaster configuration to only send me a
build simultaneously. It doesn't solve the issue. I don't have enough
resources even for a single build.

I just send this email to the owner of the machine:

"""
XXXXXXX, I know you are very busy, but I would like to request
formally the removal of the SWAP capping for my zone.

After investigating the issue, I learn this:

1. Python "make test" launch a python process that can consume >300MB
of RAM.

2. Under Solaris, a 300MB process doing a "fork()" will consume 600MB.
That is, Solaris reserves this much memory just in case the processes
modify their memory (to avoid "out of memory" condition simply because
a process write to its own memory space).

3. So, if a 300MB is forked 10 times, it is going to "virtually" use
3GB. The real memory used is actually far less in the buildbot case,
because the forked process doesn't modify their own memory so much
(forked processes use Copy On Write).

4. So, the required memory to run the buildbots is actually "modest"
compared with the "virtual" memory used.

5. A 4GB SWAP is not enough to run a single buildbot instance. I can
have up to 6 instances, but 4GB is not enough for 1. Python-devs have
modify the buildbot master for only sending me up to two build
simultaneously, trying to help. It is not helping because 4GB of swap
is not enough even for a single instance.

6. With an uncapped SWAP, the actual swapping would be quite low,
because the swap is used to ensure memory reservation for the forked
processes in the worst case (that the forked processes mess with their
own copy of the 300MB address space, COW (Copy On Write)). In practice
4GB of RAM and uncapped SWAP would be enough, with no (or little)
actual swapping.

For this reasons I formally request a reconfiguration of my zone to
uncap my SWAP usage.

The proof is actually very simple:

"""
import time, os

a="a"*1024*1024*512

os.fork() # 2 processes
os.fork() # 4 processes
os.fork() # 16 processes

time.sleep(10)
"""

Running the previous program does this to my swap: (Solaris 10 Update 9)

"""
[root at buffy /]# swap -s
total: 684704k bytes allocated + 3732892k reserved = 4417596k used,
31829688k available
"""

After the programs die, I have this:

"""
[root at buffy /]# swap -s
total: 156680k bytes allocated + 43284k reserved = 199964k used,
36118796k available
"""

In this machine, I have 4GB of RAM, 32GB of swap.

So, this trivial test requires >4GB of RAM+SWAP even if it is actually
using only ~512MB of RAM. Solaris is (rightly) playing safe being sure
the program can actually play/modify its memory space.

XXXXX, if you can't/don't want to modify my zone configuration, let me
know, so I can think what to do next. If I have to talk to somebody
else, please let me know.

Sorry for bother your with these details. I really appreciate the
effort you and your team are doing with OpenIndiana in general and
supporting the Python buildbots under OI in particular. I hope we can
solve this situation.

Thanks for your time and effort.

PS: I think that such memory+swap requirements are quite high, anyway,
and I will pursuit it. But in the meantime I need the buildbot online,
as it was a couple of weeks ago :-)

Thanks!.
"""

So, the problem is that a) "make test" takes quite a bit of RAM and b)
the buildbot forks some "big" processes, so the virtual memory needed
is BIG.

Linux is known for "overcommiting" memory. That is, playing fast and
risky not actually reserving memory, hoping the process will not
actually use it or it will do an "exec" inmediatelly, so this problem
can be not apparent under Linux, but it is there.

So I have two questions:

1. Can we reduce the memory footprint of the tests?. I can't
understand why the python test process is taking so much memory.

2. Why buildbot is "forking()" big processes?. Can we do something to
change this?.

I will wait a few days for OpenIndiana team to reply. If the result is
not satisfactory, I will try to setup a VirtualMachine with the
required resources myself. Crossing fingers...

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmdXr5lgi5GaxT1NAQKmRwP/dyg4qEs+oWt4r365D797+ItbHluuEVJ+
mWTZw5HVeDajrN7faGH6WuA/J+dJuBp2H4rB8WIM1U/DytL7aZDdDHCeXS79IlUw
SEb5kMA4ENSB6N6bhKmOWpKlwtMQWmw/CtB6//ZX29UZD6ys3UsbO8KslT+M/1EG
P2zmn3PSzo8=
=WE+9
-----END PGP SIGNATURE-----

From solipsis at pitrou.net  Wed Sep  7 14:32:59 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 7 Sep 2011 14:32:59 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es>
	<4E65AD58.6050106@jcea.es>
	<CADiSq7d5SQ2GOTR0vYsNDOB_9Y2DNW4P69LFZcaG0Tkf74U0AQ@mail.gmail.com>
	<4E6757AF.4050007@jcea.es>
Message-ID: <20110907143259.2bcff454@pitrou.net>

On Wed, 07 Sep 2011 13:38:23 +0200
Jesus Cea <jcea at jcea.es> wrote:
> 
> So, the problem is that a) "make test" takes quite a bit of RAM and b)
> the buildbot forks some "big" processes, so the virtual memory needed
> is BIG.

Note that buildbots run "make buildbottest", not "make test".

> So I have two questions:
> 
> 1. Can we reduce the memory footprint of the tests?. I can't
> understand why the python test process is taking so much memory.

Because the test suite will by construction load all the stdlib (minus
the few modules which don't have a test suite), and creates numerous
test scenarios. Depending on the memory allocator, fragmentation can
make it difficult to reclaim memory that has been formally freed after
a test is run.

If "-j" is used, tests get run in a separate process each, so that
approach might be an answer.

> 2. Why buildbot is "forking()" big processes?. Can we do something to
> change this?.

Because we need to test for various functionalities, such as os.fork()
and os.exec*(), but also the command-line behaviour of the interpreter,
the distutils module, the packaging module, the subprocess module, the
multiprocessing module... (this list is not exhaustive).

Regards

Antoine.


From solipsis at pitrou.net  Wed Sep  7 14:47:49 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 7 Sep 2011 14:47:49 +0200
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
	<CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
	<20110907030758.58caa4ed@pitrou.net>
	<8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <20110907144749.7c1a9d50@pitrou.net>

On Wed, 07 Sep 2011 11:15:04 +0900
"Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> Antoine Pitrou writes:
> 
>  > Bytes objects are often used for partly ASCII strings,
> 
> All I can say to that phrase is, "urk, ISO 2022 anyone?"

You could also point out UTF-16 or EBCDIC, but I fail to see how that's
relevant. Do you have problems with ISO 2022 when parsing, say, e-mail
headers?

>  > not arbitrary "arrays of bytes". And making indexing of bytes
>  > objects return ints was IMHO a mistake.
> 
> Bytes objects are not ASCII strings, even though they can be used to
> represent them.

I'm talking about practice, not some idealistic view of the world.
In many use cases (XML, HTML, e-mail headers, many other test-based
protocols), you can get a mixture of ASCII "commands", and opaque
binary stuff (which will or will not, depending on these "commands",
have a meaningful unicode decoding).

In the stdlib, bytes objects are accessed far more often to poke at
some text-like data, than to poke at arbitrary numbers.

> With PEP 393,
> there isn't even really a space excuse.

Of course there is. Any single non-ASCII byte of data mingled with
aforementioned ASCII "commands" will make it switch to a less efficient
representation.

And "surrogateescape" will be a performance problem in itself, when
used on large binary data; if you use "latin1" instead, you are risking
far greater confusion; ask David about that dilemma. :-)

> AFAICS, anything that should be done with ASCII-punned magic numbers
> ("protocol tokens", if you prefer) can be done with slices and (ta-da!)
> case conversion.

So, basically, you're saying that we should remove useful functionality
and tell people to reimplement an adhoc version of it when they need
it. That sounds obnoxious.

Regards

Antoine.

From hodgestar+pythondev at gmail.com  Wed Sep  7 18:31:08 2011
From: hodgestar+pythondev at gmail.com (Simon Cross)
Date: Wed, 7 Sep 2011 18:31:08 +0200
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <4E66845F.3060708@v.loewis.de>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
Message-ID: <CAD5NRCEWQBO-aNQsJWuU0xrxVKWccLNr2KnaweyRM9EDmpD3ag@mail.gmail.com>

On Tue, Sep 6, 2011 at 10:36 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Which applications? I'm not sure the number of applications using
>> str.swapcase gets even as high as ten.
>
> I think this is what people underestimate. I can't name
> applications either - but that doesn't mean they don't exist.
> I'm deeply convinced that the majority of Python code (and
> I mean *large* majority) is unpublished.
>
> I expect thousands of uses world-wide.

http://www.google.com/codesearch#search/&q=swapcase%20lang:%5Epython$&type=cs

There are quite a few hits but more people appear to be
re-implementing it than using it (I haven't gone to the trouble of
mining the search results to get an accurate picture though).

From hodgestar+pythondev at gmail.com  Wed Sep  7 18:33:46 2011
From: hodgestar+pythondev at gmail.com (Simon Cross)
Date: Wed, 7 Sep 2011 18:33:46 +0200
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <CAD5NRCEWQBO-aNQsJWuU0xrxVKWccLNr2KnaweyRM9EDmpD3ag@mail.gmail.com>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<CAD5NRCEWQBO-aNQsJWuU0xrxVKWccLNr2KnaweyRM9EDmpD3ag@mail.gmail.com>
Message-ID: <CAD5NRCEuoK6XCmH6CF2YK8vjRHSWBLER3Bw7yuvaaasPJMVfmQ@mail.gmail.com>

On Wed, Sep 7, 2011 at 6:31 PM, Simon Cross
<hodgestar+pythondev at gmail.com> wrote:
> http://www.google.com/codesearch#search/&q=swapcase%20lang:%5Epython$&type=cs
>
> There are quite a few hits but more people appear to be
> re-implementing it than using it (I haven't gone to the trouble of
> mining the search results to get an accurate picture though).

Scratch that -- I should gloss over search results less. It looks like
the most common use case is to provide a consistent string-like API
somewhere else. So removing it is liking to cause headaches (e.g. test
failures) for the people who are wrapping it.

From stephen at xemacs.org  Wed Sep  7 19:26:00 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 08 Sep 2011 02:26:00 +0900
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <20110907144749.7c1a9d50@pitrou.net>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
	<CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
	<20110907030758.58caa4ed@pitrou.net>
	<8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110907144749.7c1a9d50@pitrou.net>
Message-ID: <87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp>

Antoine Pitrou writes:

 > You could also point out UTF-16 or EBCDIC, but I fail to see how that's
 > relevant. Do you have problems with ISO 2022 when parsing, say, e-mail
 > headers?

Yes, of course!  Especially when it's say, packed EUC not encapsulated
in MIME words.  I think Mailman now handles that without crashing, but
it took 10 years.  Most Emacs MUAs still blow chunks on that.  My
procmail recipes and my employer's virus checker both occasionally punt.

The point about ISO 2022 is that it allows arbitrary binary crap in
the stream, delimited by appropriate well-defined constructs.  Just
like the ASCII-like tokens in the protocols you talk about.  But
parsing full-bore ISO 2022 is non-trivial, especially if you're going
to try to provide error-handling that's useful to the user.  Nobody
ever really took it seriously as a solution to the problem of
internationalization in the 15 years or so when it was the only
solution, and even less so once it became clear that UCSes were going
to get traction.

 > >  > not arbitrary "arrays of bytes". And making indexing of bytes
 > >  > objects return ints was IMHO a mistake.
 > > 
 > > Bytes objects are not ASCII strings, even though they can be used to
 > > represent them.
 > 
 > I'm talking about practice,

So am I, and so is Nick.

 > not some idealistic view of the world.
 > In many use cases (XML, HTML, e-mail headers, many other test-based
 > protocols), you can get a mixture of ASCII "commands", and opaque
 > binary stuff (which will or will not, depending on these "commands",
 > have a meaningful unicode decoding).

Yeah, so what?  Those protocol tokens are deliberately chosen to
resemble ASCII text, but you need to parse them out of the binary
sludge somehow, and the surrounding content remains binary sludge
until deserialized or (for text) decoded.  How is having b[0] return a
bytes object, rather than an integer, going to help in that?
Especially if the value is not in the ASCII range?

 > > AFAICS, anything that should be done with ASCII-punned magic numbers
 > > ("protocol tokens", if you prefer) can be done with slices and (ta-da!)
 > > case conversion.
 > 
 > So, basically, you're saying that we should remove useful functionality

No, that *was* Nick's position; I specifically opposed the suggestion
that "lower" and "upper" be removed, and he concurred after a bit of
thought.  And remember, he's talking about removing "swapcase".  Which
RFC defines a protocol where that would be useful?  How about "title"?

 > and tell people to reimplement an adhoc version of it when they
 > need it.

Of course not; I'm with Michael Foord on that: nobody should ever be
asked to reimplement swapcase!  My position is simply that bytes are
not text, and the occasional reminder (such as b[0] returning an
integer, not a bytes object) is good.  My experience has been that it
makes a lot of sense to layer these things, for example transforming a
protocol stream serialized as octets into a more structured object
composed of protocol tokens and payloads.  It's *not* text, and the
relevant techniques are different.

It's like the old saw about "aha, I'll use regexps to solve this
problem!" and now you have *two* problems.

I don't advocate getting rid of regexps, and I don't advocate removing
methods from bytes (although I do dream about it occasionally).  I do
advocate that people think twice before implementing complex text-like
algorithms on binary protocol streams.  If the stream really is
text-like, then transform it into text of a known, well-behaved
encoding, and then apply the powerful text-processing facilities
provided for str.  If it's not, then transform to a token stream or
whatever makes sense.  In both cases, do as little "text processing"
on bytes objects as possible, and put more structure on the content as
soon as possible.

If you really need the efficiency, then do what you need to do.  As I
say, I don't have any practical objection to keeping your tools for
that case.  But such applications, although important (I guess), are a
minority.

 > That sounds obnoxious.

Good advice almost always sounds obnoxious to the recipient.

From glyph at twistedmatrix.com  Wed Sep  7 19:51:50 2011
From: glyph at twistedmatrix.com (Glyph Lefkowitz)
Date: Wed, 7 Sep 2011 10:51:50 -0700
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
	<CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
	<20110907030758.58caa4ed@pitrou.net>
	<8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110907144749.7c1a9d50@pitrou.net>
	<87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com>

On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:

> How about "title"?

>>> 'content-length'.title()
'Content-Length'

You might say that the protocol "has" to be case-insensitive so this is a silly frill: there are definitely enough case-sensitive crappy bits of network middleware out there that this function is critically important for an HTTP server.

In general I'd like to defend keeping as many of these methods as possible for compatibility (porting to Py3 is already hard enough).  Although even I might have a hard time defending 'swapcase', which is never used _at all_ within Twisted, on text or bytes.  The only use-case I can think of for that method is goofy joke text filters, and it wouldn't be very good at that either.

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110907/cb602ee4/attachment.html>

From ncoghlan at gmail.com  Thu Sep  8 00:29:33 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 8 Sep 2011 08:29:33 +1000
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
	<CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
	<20110907030758.58caa4ed@pitrou.net>
	<8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110907144749.7c1a9d50@pitrou.net>
	<87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com>
Message-ID: <CADiSq7cW8SzOCw6PU_Vf6Qw49zdkR42DBaVMKvkwmB5Snh-5_g@mail.gmail.com>

On Thu, Sep 8, 2011 at 3:51 AM, Glyph Lefkowitz <glyph at twistedmatrix.com> wrote:
> On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:
>
> How about "title"?
>
>>>> 'content-length'.title()
> 'Content-Length'
> You might say that the protocol "has" to be case-insensitive so this is a
> silly frill: there are definitely?enough case-sensitive crappy bits of
> network middleware out there that this function is critically important for
> an HTTP server.

Actually, the HTTP header case occurred to me as well shortly after
sending my last message, so I think it's a legitimate reason to keep
the methods around on bytes and bytearray.

So, putting my "practicality beats purity" hat back on, I would
describe the status quo as follows:

1. Binary data is not text, so bytes and bytearray are deliberately
conceptualised as arrays of arbitrary integers in the range 0-255
rather than as arrays of 8-bit 'characters'. This distinction is one
of the core design principles separating Python 3 from Python 2.

2. However, the use of ASCII words and characters is a common feature
of many existing wire protocols, so it is useful to be able to
manipulate binary sequences that contain data in an ASCII-compatible
format without having to convert them to text first. Retaining
additional ASCII-based methods also eases the transition to Python 3
for code that manipulates binary data using the 2.x str type.

3. ASCII whitespace characters are used as delimeters in many formats.
Thus, various methods such as split(), partition(), strip() and their
variants, retain their "ASCII whitespace" default arguments and
expandtabs() is also retained.

4. Padding values out to fill fields of a certain size is needed for
some formats. Thus, center(), ljust(), rjust(), zfill() are retained
(again retaining their ASCII space default fill character in the case
of the first 3 methods)

5. Identifying ASCII alphanumeric data is important for some formats.
Thus, isalnum(), isalpha() and isdigit() are retained.

6. Case insensitive ASCII comparisons are important for some formats
(e.g. RFC 822 headers, HTTP headers). Thus, upper(), lower(),
isupper() and islower() are retained.

7. Even correct mixed case ASCII can be important for some formats
(e.g. HTTP headers). Thus, capitalize(), title() and istitle() are
retained.

8. A valid use for swapcase() on binary data has not been identified,
but once all the other ASCII based methods are being kept around for
the various reasons given above, it doesn't seem worth the effort to
get rid of this one (despite the additional implementation effort
needed for alternate implementations).

9. Algorithms that operate purely on binary data or purely on text can
just use literals of the appropriate type (if they use literals at
all). Algorithms that are designed to operate on either kind of data
may want to adopt an implicit decode/encode approach to handle binary
inputs (this allows assumptions regarding the input encoding to be
made explicit).

I'm actually fairly happy with that rationalisation for the current
Python 3 set up. I'd been thinking recently that we would have been
better off if more of the methods that rely on the data using an ASCII
compatible encoding scheme had been removed from bytes and bytearray,
but swapcase() is really the only one we can't give a decent
justification for beyond "it was there in 2.x".

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From jcea at jcea.es  Thu Sep  8 03:12:33 2011
From: jcea at jcea.es (Jesus Cea)
Date: Thu, 08 Sep 2011 03:12:33 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <20110907143259.2bcff454@pitrou.net>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es>
	<4E65AD58.6050106@jcea.es>
	<CADiSq7d5SQ2GOTR0vYsNDOB_9Y2DNW4P69LFZcaG0Tkf74U0AQ@mail.gmail.com>
	<4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net>
Message-ID: <4E681681.6060405@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/09/11 14:32, Antoine Pitrou wrote:
> If "-j" is used, tests get run in a separate process each, so that
>  approach might be an answer.

Antoine, I think this would be the answer. Each test would be a bit
slower, because I would launch a new python process per test, but I
could run 16 tests in parallel (I have 16 CPUs and, actually, most
tests are not CPU intensive). I sorry to bother you with these details
and waste of time, but could you possibly change my buildbot
configurarion to launch, let's say, 4 test processes in parallel, just
for testing?.

Another option would be to have a single Python process and "fork" for
each test. That would launch each test in a separate process without
requiring a full python interpreter launching each time. Is this the
way "-j" is implemented, or is "-j" something external, like "make -j"?.

BTW, the (nice and helpful) OpenIndiana folks have told me a few hours
ago that they would increase my swap limit to 16GB. I am now waiting
for this change to be done.

I want my six builds in parallel (2.7, 3.2, 3.x, in 32 and 64 bits) back!.

Sorry for wasting your time with these mundane details...

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmgWgZlgi5GaxT1NAQI/eAP/anenlTjt7NxIzMLK+ME+f84zLurb8MS/
XiLRpVSNDn6TzKnqXtDLfOc6sua81h+ZlpHvuFNHOkK9u/PkmeUKidgoDvASj5Ti
ITUmUxigX1j9ZbD1ITkn53msm1xfug3rw/8+Rh//4ONhhbmhSm8ChZ0iNwtntToG
5SwL3BL2iSI=
=fCJe
-----END PGP SIGNATURE-----

From meadori at gmail.com  Thu Sep  8 04:06:29 2011
From: meadori at gmail.com (Meador Inge)
Date: Wed, 7 Sep 2011 21:06:29 -0500
Subject: [Python-Dev] python -m tokenize in 3.x ?
Message-ID: <CAK1QoorxSqQC4E=gessXTPJHn0Btaq7c_=mvtXuKFJPLVhmcxw@mail.gmail.com>

Hi All,

I have been investing some 'tokenize' bugs recently.  As a part of
that investigation I was trying to use '-m tokenize', which works
great in 2.x:

[meadori at motherbrain cpython]$ python2.7 -m tokenize test.py
1,0-1,5:        NAME    'print'
1,6-1,21:       STRING  '"Hello, World!"'
1,21-1,22:      NEWLINE '\n'
2,0-2,0:        ENDMARKER       ''

In 3.x, however, the functionality has been removed and replaced with
some hard-wired test code:

[meadori at motherbrain cpython]$ python3 -m tokenize test.py
TokenInfo(type=57 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
TokenInfo(type=1 (NAME), string='def', start=(1, 0), end=(1, 3),
line='def parseline(self, line):')
TokenInfo(type=1 (NAME), string='parseline', start=(1, 4), end=(1,
13), line='def parseline(self, line):')
TokenInfo(type=53 (OP), string='(', start=(1, 13), end=(1, 14),
line='def parseline(self, line):')
...

Why is this?  I found the commit where the functionality was removed
[1], but no explanation.  Any objection to adding this feature back?

[1] http://hg.python.org/cpython/rev/51e24512e305/

-- 
# Meador

From stephen at xemacs.org  Thu Sep  8 04:46:42 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 08 Sep 2011 11:46:42 +0900
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
	<CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
	<20110907030758.58caa4ed@pitrou.net>
	<8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110907144749.7c1a9d50@pitrou.net>
	<87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com>
Message-ID: <87pqjbhht9.fsf@uwakimon.sk.tsukuba.ac.jp>

Glyph Lefkowitz writes:
 > On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:
 > 
 > > How about "title"?
 > 
 > >>> 'content-length'.title()
 > 'Content-Length'
 > 
 > You might say that the protocol "has" to be case-insensitive so
 > this is a silly frill:

Not me, sir.  My whole point about the "bytes should be more like str"
controversy is the dual of that: you don't know what will be coming at
you, so the regularities and (normally allowable) fuzziness of text
processing are inadmissible.

 > there are definitely enough case-sensitive crappy bits of network
 > middleware out there that this function is critically important for
 > an HTTP server.

"Critically important" is surely an overstatement.  You could always
title-case the literal strings containing field names in the source.

The problem with having lots of str-like features on bytes is that you
lose TOOWDTI, or worse, to many performance-happy coders, use of bytes
becomes TOOWDTI "because none of the characters[sic] I'm planning to
process myself are non-ASCII".  This is the road to Babel; it's
workable for one-off scripts but it's asking for long-term trouble in
multi-module applications.  The choice of decoding to str and
processing in that form should be made as attractive as possible.

On the other hand, it is undeniably useful for protocol tokens to have
mnemonic representations even in binary protocols.  Textual
manipulations on those tokens should be convenient.

It seems to me that what might be an improvement over the current
situation (maybe for Py4k only, though) is for bytes and
(PEP-393-style) str to share representation, and have a "cast" method
which would convert from one to the other, validating that the range
contraints on the representation are satisfied.  The problem I see is
that this either sanctions the practice of using latin-1 as "ASCII
plus anything", which is an unpleasant hack, or you'd need to check in
text methods that nothing is done with non-ASCII values other than
checks for set membership (including equality comparison, of course).

OTOH, AFAICS, Antoine's claim that inserting a non-latin-1 character
in a str that happens to contain only ASCII values would convert the
representation to multioctets (true), and therefore this doesn't give
the desired efficiency properties, is beside the point.  Just don't do
that!  You *can't* do that in a bytes object, anyway; use of str in
this way is a "consenting adults" issue.  You trade off the
convenience of the full suite of text tools vs. the possibility that
somebody might insert such a character -- but for the algorithms
they're going to be using, they shouldn't be doing that anyway.


From guido at python.org  Thu Sep  8 05:48:19 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 7 Sep 2011 20:48:19 -0700
Subject: [Python-Dev] python -m tokenize in 3.x ?
In-Reply-To: <CAK1QoorxSqQC4E=gessXTPJHn0Btaq7c_=mvtXuKFJPLVhmcxw@mail.gmail.com>
References: <CAK1QoorxSqQC4E=gessXTPJHn0Btaq7c_=mvtXuKFJPLVhmcxw@mail.gmail.com>
Message-ID: <CAP7+vJJdKXkciCp29rOC4Us6Xfq=RZBnOaB_+BE=vGnz9F0TBw@mail.gmail.com>

My guess that there was no specific intent -- most likely it occurred
to nobody that the main() functionality was actually useful. I'd say
it's fine to put it back, and then document it (so it won't be removed
again :-).

--Guido

On Wed, Sep 7, 2011 at 7:06 PM, Meador Inge <meadori at gmail.com> wrote:
> Hi All,
>
> I have been investing some 'tokenize' bugs recently. ?As a part of
> that investigation I was trying to use '-m tokenize', which works
> great in 2.x:
>
> [meadori at motherbrain cpython]$ python2.7 -m tokenize test.py
> 1,0-1,5: ? ? ? ?NAME ? ?'print'
> 1,6-1,21: ? ? ? STRING ?'"Hello, World!"'
> 1,21-1,22: ? ? ?NEWLINE '\n'
> 2,0-2,0: ? ? ? ?ENDMARKER ? ? ? ''
>
> In 3.x, however, the functionality has been removed and replaced with
> some hard-wired test code:
>
> [meadori at motherbrain cpython]$ python3 -m tokenize test.py
> TokenInfo(type=57 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='')
> TokenInfo(type=1 (NAME), string='def', start=(1, 0), end=(1, 3),
> line='def parseline(self, line):')
> TokenInfo(type=1 (NAME), string='parseline', start=(1, 4), end=(1,
> 13), line='def parseline(self, line):')
> TokenInfo(type=53 (OP), string='(', start=(1, 13), end=(1, 14),
> line='def parseline(self, line):')
> ...
>
> Why is this? ?I found the commit where the functionality was removed
> [1], but no explanation. ?Any objection to adding this feature back?
>
> [1] http://hg.python.org/cpython/rev/51e24512e305/
>
> --
> # Meador
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (python.org/~guido)

From solipsis at pitrou.net  Thu Sep  8 09:18:05 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 8 Sep 2011 09:18:05 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es>
	<4E65AD58.6050106@jcea.es>
	<CADiSq7d5SQ2GOTR0vYsNDOB_9Y2DNW4P69LFZcaG0Tkf74U0AQ@mail.gmail.com>
	<4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net>
	<4E681681.6060405@jcea.es>
Message-ID: <20110908091805.3f1e9141@pitrou.net>


Hello Jesus,

> I sorry to bother you with these details
> and waste of time, but could you possibly change my buildbot
> configurarion to launch, let's say, 4 test processes in parallel, just
> for testing?

Ok, I've added "-j4", let's how that works.

> Another option would be to have a single Python process and "fork" for
> each test. That would launch each test in a separate process without
> requiring a full python interpreter launching each time. Is this the
> way "-j" is implemented

It uses subprocess actually, so fork() + exec() is used.

> BTW, the (nice and helpful) OpenIndiana folks have told me a few hours
> ago that they would increase my swap limit to 16GB. I am now waiting
> for this change to be done.

Good news :)

Regards

Antoine.


From ezio.melotti at gmail.com  Thu Sep  8 11:11:52 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Thu, 8 Sep 2011 12:11:52 +0300
Subject: [Python-Dev] [Python-checkins] cpython: Issue #12567: Fix
	curses.unget_wch() tests
In-Reply-To: <E1R0qi0-0002Zh-I3@dinsdale.python.org>
References: <E1R0qi0-0002Zh-I3@dinsdale.python.org>
Message-ID: <CACBhJdEVwVEBJVfn8BX5G19NwfZzN6oTEwXs=q0=mJOcCNtsHA@mail.gmail.com>

Hi,

On Tue, Sep 6, 2011 at 11:08 AM, victor.stinner
<python-checkins at python.org>wrote:

> http://hg.python.org/cpython/rev/786668a4fb6b
> changeset:   72301:786668a4fb6b
> user:        Victor Stinner <vstinner at wyplay.com>
> date:        Tue Sep 06 10:08:28 2011 +0200
> summary:
>  Issue #12567: Fix curses.unget_wch() tests
>
> Skip the test if the function is missing. Use U+0061 (a) instead of U+00E9
> (?)
> because U+00E9 raises a _curses.error('unget_wch() returned ERR') on some
> buildbots. It's maybe because of the locale encoding.
>
> files:
>  Lib/test/test_curses.py |  6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
>
>
> diff --git a/Lib/test/test_curses.py b/Lib/test/test_curses.py
> --- a/Lib/test/test_curses.py
> +++ b/Lib/test/test_curses.py
> @@ -265,14 +265,16 @@
>     stdscr.getkey()
>
>  def test_unget_wch(stdscr):
> -    ch = '\xe9'
> +    if not hasattr(curses, 'unget_wch'):
> +        return
>

This should be a skip, not a bare return.


> +    ch = 'a'
>     curses.unget_wch(ch)
>     read = stdscr.get_wch()
>     read = chr(read)
>     if read != ch:
>         raise AssertionError("%r != %r" % (read, ch))
>

Why not just assertEqual?


>
> -    ch = ord('\xe9')
> +    ch = ord('a')
>     curses.unget_wch(ch)
>     read = stdscr.get_wch()
>     if read != ch:
>
>
>
Best Regards,
Ezio Melotti
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110908/5d5e68fd/attachment.html>

From fwierzbicki at gmail.com  Fri Sep  9 00:09:09 2011
From: fwierzbicki at gmail.com (fwierzbicki at gmail.com)
Date: Thu, 8 Sep 2011 15:09:09 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<201108262337.42349.victor.stinner@haypocalc.com>
	<CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
Message-ID: <CADrh4zJZC1ZC0HegPEh5ThkDB8jgF23oJmmOr1bFpqKHdu4_mw@mail.gmail.com>

On Fri, Aug 26, 2011 at 3:00 PM, Guido van Rossum <guido at python.org> wrote:
> I have a different question about IronPython and Jython now. Do their
> regular expression libraries support Unicode better than CPython's?
> E.g. does "." match a surrogate pair? Tom C suggests that Java's regex
> libraries get this and many other details right despite Java's use of
> UTF-16 to represent strings. So hopefully Jython's re library is built
> on top of Java's?
>
> PS. Is there a better contact for Jython?
The best contact for Unicode and Jython is Jim Baker (I added him to
the cc) - I'll do my best to answer though: Java 5 added a bunch of
methods for dealing with Unicode that doesn't fit into 2 bytes - and
looking at our code for our Unicode object, I see that we are using
methods like the codePointCount method off of java.lang.String to
compute length[1] and using similar methods all through that code to
make sure we deal in code points when dealing with unicode.  So it
looks pretty good for us as far as I can tell.

[1] http://download.oracle.com/javase/6/docs/api/java/lang/String.html#codePointCount(int,
int)

-Frank Wierzbicki

From fwierzbicki at gmail.com  Fri Sep  9 00:15:46 2011
From: fwierzbicki at gmail.com (fwierzbicki at gmail.com)
Date: Thu, 8 Sep 2011 15:15:46 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CADrh4zJZC1ZC0HegPEh5ThkDB8jgF23oJmmOr1bFpqKHdu4_mw@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<201108262337.42349.victor.stinner@haypocalc.com>
	<CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
	<CADrh4zJZC1ZC0HegPEh5ThkDB8jgF23oJmmOr1bFpqKHdu4_mw@mail.gmail.com>
Message-ID: <CADrh4z+U8uQrxtQ4Nq=qBB7_U3yzZQvLKpqiRb3r8W_nZfLQLA@mail.gmail.com>

Oops, forgot to add the link for the gory details for Java and > 2 byte unicode:

http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

From fwierzbicki at gmail.com  Fri Sep  9 00:50:45 2011
From: fwierzbicki at gmail.com (fwierzbicki at gmail.com)
Date: Thu, 8 Sep 2011 15:50:45 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<201108262337.42349.victor.stinner@haypocalc.com>
	<CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
Message-ID: <CADrh4zLAiRSRegn9QK+HU8py5f1oooKUTzWpL1BEu86a0s03VQ@mail.gmail.com>

On Fri, Aug 26, 2011 at 3:00 PM, Guido van Rossum <guido at python.org> wrote:
> I have a different question about IronPython and Jython now. Do their
> regular expression libraries support Unicode better than CPython's?
> E.g. does "." match a surrogate pair? Tom C suggests that Java's regex
> libraries get this and many other details right despite Java's use of
> UTF-16 to represent strings. So hopefully Jython's re library is built
> on top of Java's?
Even bigger oops - I answered the thread questions and not this
specific one.  Currently Jython's re is a Jython specific
implementation and so is not likely to benefit from the improvements
in Java's re implementation. I think in terms of PEP 393 this should
probably be considered a bug that we need to fix...

-Frank Wierzbicki

From tjreedy at udel.edu  Fri Sep  9 07:39:21 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 09 Sep 2011 01:39:21 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CADrh4z+U8uQrxtQ4Nq=qBB7_U3yzZQvLKpqiRb3r8W_nZfLQLA@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<201108262337.42349.victor.stinner@haypocalc.com>
	<CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
	<CADrh4zJZC1ZC0HegPEh5ThkDB8jgF23oJmmOr1bFpqKHdu4_mw@mail.gmail.com>
	<CADrh4z+U8uQrxtQ4Nq=qBB7_U3yzZQvLKpqiRb3r8W_nZfLQLA@mail.gmail.com>
Message-ID: <j4c8s1$473$1@dough.gmane.org>

On 9/8/2011 6:15 PM, fwierzbicki at gmail.com wrote:
> Oops, forgot to add the link for the gory details for Java and>  2 byte unicode:
>
> http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

This is dated 2004. Basically, they considered several options, tried 
out 4, and ended up sticking with char[] (sequences) as UTF-16 with char 
= 16 bit code unit and added 32-bit Character(int) class for low-level 
manipulation of code points.

I did not see the indexing problem mentioned. I get the impression that 
they encourage sequence forward-backward iteration (cursor-based access) 
rather than random-access indexing.

-- 
Terry Jan Reedy


From jcea at jcea.es  Fri Sep  9 17:14:07 2011
From: jcea at jcea.es (Jesus Cea)
Date: Fri, 09 Sep 2011 17:14:07 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <20110908091805.3f1e9141@pitrou.net>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es>
	<4E65AD58.6050106@jcea.es>
	<CADiSq7d5SQ2GOTR0vYsNDOB_9Y2DNW4P69LFZcaG0Tkf74U0AQ@mail.gmail.com>
	<4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net>
	<4E681681.6060405@jcea.es> <20110908091805.3f1e9141@pitrou.net>
Message-ID: <4E6A2D3F.3070906@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/09/11 09:18, Antoine Pitrou wrote:
> Ok, I've added "-j4", let's how that works.

It is not helping. it is taking tons of memory yet.

>> Another option would be to have a single Python process and
>> "fork" for each test. That would launch each test in a separate
>> process without requiring a full python interpreter launching
>> each time. Is this the way "-j" is implemented
> 
> It uses subprocess actually, so fork() + exec() is used.

Yes, does it but fork for each test or simply launch 4 processes, each
doing 1/4 of the tests?.

>> BTW, the (nice and helpful) OpenIndiana folks have told me a few
>> hours ago that they would increase my swap limit to 16GB. I am
>> now waiting for this change to be done.
> 
> Good news :)

16GB of swap activated a few minutes ago. Thanks, Jon and Alastair :-)
(OpenIndiana guys). Launching buildbots now and crossing fingers...

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmotPplgi5GaxT1NAQKUzQP/Qm+lyCQeldL1XEkkq1EHY5C/hKvMDz9i
qOV29iai/hkeqRWY2Fiu4vSfNTDAEil9eEIJQMGmUyYOMCrfOEoDCYzr+xTWfnNu
EWzI6mEe8XWIUicGDAf/dbUEk11wtSrtXA09G0Q5oQWg0b6auQHYv5vhZITwDWSO
h9rLBnZ0ZHI=
=8Mpw
-----END PGP SIGNATURE-----

From jcea at jcea.es  Fri Sep  9 17:14:18 2011
From: jcea at jcea.es (Jesus Cea)
Date: Fri, 09 Sep 2011 17:14:18 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <20110908091805.3f1e9141@pitrou.net>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es>
	<4E65AD58.6050106@jcea.es>
	<CADiSq7d5SQ2GOTR0vYsNDOB_9Y2DNW4P69LFZcaG0Tkf74U0AQ@mail.gmail.com>
	<4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net>
	<4E681681.6060405@jcea.es> <20110908091805.3f1e9141@pitrou.net>
Message-ID: <4E6A2D4A.30503@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/09/11 09:18, Antoine Pitrou wrote:
> Ok, I've added "-j4", let's how that works.

It is not helping. it is taking tons of memory yet.

>> Another option would be to have a single Python process and 
>> "fork" for each test. That would launch each test in a separate 
>> process without requiring a full python interpreter launching 
>> each time. Is this the way "-j" is implemented
> 
> It uses subprocess actually, so fork() + exec() is used.

Yes, does it but fork for each test or simply launch 4 processes, each
doing 1/4 of the tests?.

>> BTW, the (nice and helpful) OpenIndiana folks have told me a few 
>> hours ago that they would increase my swap limit to 16GB. I am 
>> now waiting for this change to be done.
> 
> Good news :)

16GB of swap activated a few minutes ago. Thanks, Jon and Alastair :-)
(OpenIndiana guys). Launching buildbots now and crossing fingers...

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmotSplgi5GaxT1NAQIHEwQAhcKKUerwx++/egmYRO86z5XmgiWh/chz
j3xNcMau7L2pxqymEUwfQKihXrYS58ocTiRBEyHAl3vMOouRwgS8joT2eQugfjux
Cy+Rglw+4yg99n+eGwF0z4QxbEljuBJFIrR/+BKeN0sBdT/n1/PZIkN/cWMLDk8t
bw1FtxfSW6s=
=F0er
-----END PGP SIGNATURE-----

From status at bugs.python.org  Fri Sep  9 18:07:27 2011
From: status at bugs.python.org (Python tracker)
Date: Fri,  9 Sep 2011 18:07:27 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20110909160727.9E4081CA91@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-09-02 - 2011-09-09)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3000 (+33)
  closed 21727 (+26)
  total  24727 (+59)

Open issues with patches: 1287 


Issues opened (49)
==================

#12887: Documenting all SO_* constants in socket module
http://bugs.python.org/issue12887  opened by sandro.tosi

#12890: cgitb displays <p> tags when executed in text mode
http://bugs.python.org/issue12890  opened by mcjeff

#12891: Clean up traces of manifest template in packaging
http://bugs.python.org/issue12891  opened by eric.araujo

#12892: UTF-16 and UTF-32 codecs should reject (lone) surrogates
http://bugs.python.org/issue12892  opened by ezio.melotti

#12895: In MSI/EXE installer,	allow installing Python modules in free 
http://bugs.python.org/issue12895  opened by cool-RR

#12896: Recommended location of the interpreter for Python 3
http://bugs.python.org/issue12896  opened by lregebro

#12897: Support for iterators in multiprocessing map
http://bugs.python.org/issue12897  opened by acooke

#12900: Use universal newlines mode for setup.cfg
http://bugs.python.org/issue12900  opened by eric.araujo

#12901: Nest class/methods directives in documentation
http://bugs.python.org/issue12901  opened by eric.araujo

#12902: help("modules") executes module code
http://bugs.python.org/issue12902  opened by dronus

#12903: test_io.test_interrupte[r]d* blocks on OpenBSD
http://bugs.python.org/issue12903  opened by rpointel

#12904: Change os.utime &c functions to use nanosecond precision where
http://bugs.python.org/issue12904  opened by larry

#12905: multiple errors in test_socket on OpenBSD
http://bugs.python.org/issue12905  opened by rpointel

#12907: Update test coverage devguide page
http://bugs.python.org/issue12907  opened by brett.cannon

#12908: Update dev-in-a-box for new coverage steps
http://bugs.python.org/issue12908  opened by brett.cannon

#12910: urrlib.quote quotes too many chars, e.g., '()'
http://bugs.python.org/issue12910  opened by joern

#12911: Expose a private accumulator C API
http://bugs.python.org/issue12911  opened by pitrou

#12912: xmlrpclib.__version__ not bumped with updates
http://bugs.python.org/issue12912  opened by rcritten

#12913: Add a debugging howto
http://bugs.python.org/issue12913  opened by eric.araujo

#12914: Add cram function to textwrap
http://bugs.python.org/issue12914  opened by eric.araujo

#12915: Add inspect.locate and inspect.resolve
http://bugs.python.org/issue12915  opened by eric.araujo

#12916: Add inspect.splitdoc
http://bugs.python.org/issue12916  opened by eric.araujo

#12917: Make visiblename and allmethods functions public
http://bugs.python.org/issue12917  opened by eric.araujo

#12918: New module for terminal utilities
http://bugs.python.org/issue12918  opened by eric.araujo

#12919: Control what module is imported first
http://bugs.python.org/issue12919  opened by brett.cannon

#12920: Inspect.getsource fails to get source of local classes
http://bugs.python.org/issue12920  opened by Popa.Claudiu

#12921: http.server.BaseHTTPRequestHandler.send_error and trailing new
http://bugs.python.org/issue12921  opened by Paul.Upchurch

#12922: StringIO and seek()
http://bugs.python.org/issue12922  opened by terry.reedy

#12923: test_urllib fails in refleak mode
http://bugs.python.org/issue12923  opened by skrah

#12924: Missing call to quote_plus() in test_urllib.test_default_quoti
http://bugs.python.org/issue12924  opened by jon

#12925: python setup.py upload_docs doesn't ask for login and password
http://bugs.python.org/issue12925  opened by cancel

#12926: tarfile tarinfo.extract*() broken with symlinks
http://bugs.python.org/issue12926  opened by Fabio.Erculiani

#12927: test_ctypes: segfault with suncc
http://bugs.python.org/issue12927  opened by skrah

#12930: reindent.py inserts spaces in multiline literals
http://bugs.python.org/issue12930  opened by Dima.Tisnek

#12931: xmlrpclib confuses unicode and string
http://bugs.python.org/issue12931  opened by wosc

#12932: dircmp does not allow non-shallow comparisons
http://bugs.python.org/issue12932  opened by kesmit

#12933: Update or remove claims that distutils requires external progr
http://bugs.python.org/issue12933  opened by eric.araujo

#12934: pysetup doesn???t work for the docutils project
http://bugs.python.org/issue12934  opened by eric.araujo

#12935: Typo in findertools.py
http://bugs.python.org/issue12935  opened by karstenw

#12936: armv5tejl: random segfaults in getaddrinfo()
http://bugs.python.org/issue12936  opened by skrah

#12937: Support install options as found in distutils
http://bugs.python.org/issue12937  opened by brett.cannon

#12938: html.escape docstring does not mention single quotes (')
http://bugs.python.org/issue12938  opened by zvin

#12939: Add new io.FileIO using the native Windows API
http://bugs.python.org/issue12939  opened by haypo

#12940: Cmd example using turtle left vs. right doc-bug
http://bugs.python.org/issue12940  opened by Gumnos

#12941: add random.pop()
http://bugs.python.org/issue12941  opened by jfeuerstein

#12942: Shebang line fixer for 2to3
http://bugs.python.org/issue12942  opened by Aaron.Meurer

#12943: tokenize: add python -m tokenize support back
http://bugs.python.org/issue12943  opened by meadori

#12944: setup.py upload to pypi needs to work with specified files
http://bugs.python.org/issue12944  opened by illume

#12945: ctypes works incorrectly with _swappedbytes_ = 1
http://bugs.python.org/issue12945  opened by Pavel.Boldin


Most recent 15 issues with no replies (15)
==========================================

#12945: ctypes works incorrectly with _swappedbytes_ = 1
http://bugs.python.org/issue12945

#12944: setup.py upload to pypi needs to work with specified files
http://bugs.python.org/issue12944

#12943: tokenize: add python -m tokenize support back
http://bugs.python.org/issue12943

#12942: Shebang line fixer for 2to3
http://bugs.python.org/issue12942

#12937: Support install options as found in distutils
http://bugs.python.org/issue12937

#12936: armv5tejl: random segfaults in getaddrinfo()
http://bugs.python.org/issue12936

#12935: Typo in findertools.py
http://bugs.python.org/issue12935

#12934: pysetup doesn???t work for the docutils project
http://bugs.python.org/issue12934

#12933: Update or remove claims that distutils requires external progr
http://bugs.python.org/issue12933

#12932: dircmp does not allow non-shallow comparisons
http://bugs.python.org/issue12932

#12926: tarfile tarinfo.extract*() broken with symlinks
http://bugs.python.org/issue12926

#12924: Missing call to quote_plus() in test_urllib.test_default_quoti
http://bugs.python.org/issue12924

#12923: test_urllib fails in refleak mode
http://bugs.python.org/issue12923

#12922: StringIO and seek()
http://bugs.python.org/issue12922

#12921: http.server.BaseHTTPRequestHandler.send_error and trailing new
http://bugs.python.org/issue12921


Most recent 15 issues waiting for review (15)
=============================================

#12941: add random.pop()
http://bugs.python.org/issue12941

#12931: xmlrpclib confuses unicode and string
http://bugs.python.org/issue12931

#12930: reindent.py inserts spaces in multiline literals
http://bugs.python.org/issue12930

#12924: Missing call to quote_plus() in test_urllib.test_default_quoti
http://bugs.python.org/issue12924

#12919: Control what module is imported first
http://bugs.python.org/issue12919

#12911: Expose a private accumulator C API
http://bugs.python.org/issue12911

#12903: test_io.test_interrupte[r]d* blocks on OpenBSD
http://bugs.python.org/issue12903

#12901: Nest class/methods directives in documentation
http://bugs.python.org/issue12901

#12890: cgitb displays <p> tags when executed in text mode
http://bugs.python.org/issue12890

#12881: ctypes: segfault with large structure field names
http://bugs.python.org/issue12881

#12872: --with-tsc crashes on ppc64
http://bugs.python.org/issue12872

#12857: Expose called function on frame object
http://bugs.python.org/issue12857

#12856: tempfile PRNG reuse between parent and child process
http://bugs.python.org/issue12856

#12855: linebreak sequences should be better documented
http://bugs.python.org/issue12855

#12850: [PATCH] stm.atomic
http://bugs.python.org/issue12850


Top 10 most discussed issues (10)
=================================

#12905: multiple errors in test_socket on OpenBSD
http://bugs.python.org/issue12905  14 msgs

#2636: Adding a new regex module (compatible with re)
http://bugs.python.org/issue2636   8 msgs

#12105: open() does not able to set flags, such as O_CLOEXEC
http://bugs.python.org/issue12105   8 msgs

#5845: rlcompleter should be enabled automatically
http://bugs.python.org/issue5845   6 msgs

#12729: Python lib re cannot handle Unicode properly due to narrow/wid
http://bugs.python.org/issue12729   6 msgs

#5876: __repr__ returning unicode doesn't work when called implicitly
http://bugs.python.org/issue5876   5 msgs

#7219: Unhelpful error message when a distutils package install fails
http://bugs.python.org/issue7219   5 msgs

#12870: Regex object should have introspection methods
http://bugs.python.org/issue12870   5 msgs

#12904: Change os.utime &c functions to use nanosecond precision where
http://bugs.python.org/issue12904   5 msgs

#12911: Expose a private accumulator C API
http://bugs.python.org/issue12911   5 msgs


Issues closed (25)
==================

#7798: Make generally useful pydoc functions public
http://bugs.python.org/issue7798  closed by eric.araujo

#8286: distutils: path '[...]' cannot end with '/' -- need better err
http://bugs.python.org/issue8286  closed by eric.araujo

#10191: scripts files are not RECORDed.
http://bugs.python.org/issue10191  closed by eric.araujo

#11155: multiprocessing.Queue's put() signature differs from docs
http://bugs.python.org/issue11155  closed by python-dev

#11561: "coverage" of Python regrtest cannot see initial import of lib
http://bugs.python.org/issue11561  closed by brett.cannon

#12117: Failures with PYTHONDONTWRITEBYTECODE: test_importlib, test_im
http://bugs.python.org/issue12117  closed by eric.araujo

#12764: segfault in ctypes.Struct with bad _fields_
http://bugs.python.org/issue12764  closed by meadori

#12781: Mention SO_REUSEADDR near socket doc examples
http://bugs.python.org/issue12781  closed by sandro.tosi

#12840: "maintainer" value clear the "author" value when register
http://bugs.python.org/issue12840  closed by eric.araujo

#12841: Incorrect tarfile.py extraction
http://bugs.python.org/issue12841  closed by lars.gustaebel

#12852: POSIX level issues in posixmodule.c on OpenBSD 5.0
http://bugs.python.org/issue12852  closed by haypo

#12862: ConfigParser does not implement "comments need to be preceded 
http://bugs.python.org/issue12862  closed by lukasz.langa

#12863: py32 > Lib > xml.minidom > usage feedback > overrides
http://bugs.python.org/issue12863  closed by eric.araujo

#12871: Disable sched_get_priority_min/max if Python is compiled witho
http://bugs.python.org/issue12871  closed by neologix

#12878: io.StringIO doesn't provide a __dict__ field
http://bugs.python.org/issue12878  closed by python-dev

#12888: html.parser.HTMLParser.unescape works only with the first 128 
http://bugs.python.org/issue12888  closed by ezio.melotti

#12889: struct.pack('d'... problem
http://bugs.python.org/issue12889  closed by mark.dickinson

#12893: Invitation to connect on LinkedIn
http://bugs.python.org/issue12893  closed by nadeem.vawda

#12894: pydoc help("modules keyword") is failing when a module throws 
http://bugs.python.org/issue12894  closed by ned.deily

#12898: add opendir() for POSIX platforms
http://bugs.python.org/issue12898  closed by haypo

#12899: Change os.utimensat() and os.futimens() to use float for atime
http://bugs.python.org/issue12899  closed by larry

#12906: Slight error in logging module's yaml config
http://bugs.python.org/issue12906  closed by python-dev

#12909: Inconsistent exception usage in PyLong_As* C functions
http://bugs.python.org/issue12909  closed by nadeem.vawda

#12928: exec not woking in unittest
http://bugs.python.org/issue12928  closed by benjamin.peterson

#12929: faulthandler: void pointer used in arithmetic
http://bugs.python.org/issue12929  closed by haypo

From fwierzbicki at gmail.com  Fri Sep  9 18:12:38 2011
From: fwierzbicki at gmail.com (fwierzbicki at gmail.com)
Date: Fri, 9 Sep 2011 09:12:38 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j4c8s1$473$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<201108262337.42349.victor.stinner@haypocalc.com>
	<CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
	<CADrh4zJZC1ZC0HegPEh5ThkDB8jgF23oJmmOr1bFpqKHdu4_mw@mail.gmail.com>
	<CADrh4z+U8uQrxtQ4Nq=qBB7_U3yzZQvLKpqiRb3r8W_nZfLQLA@mail.gmail.com>
	<j4c8s1$473$1@dough.gmane.org>
Message-ID: <CADrh4zLNFfcXVqO7SvD1=3GvC_DgFNEkJ9=jBkkJRW=nnE7BLQ@mail.gmail.com>

On Thu, Sep 8, 2011 at 10:39 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 9/8/2011 6:15 PM, fwierzbicki at gmail.com wrote:
>>
>> Oops, forgot to add the link for the gory details for Java and> ?2 byte
>> unicode:
>>
>> http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
>
> This is dated 2004. Basically, they considered several options, tried out 4,
> and ended up sticking with char[] (sequences) as UTF-16 with char = 16 bit
> code unit and added 32-bit Character(int) class for low-level manipulation
> of code points.
>
> I did not see the indexing problem mentioned. I get the impression that they
> encourage sequence forward-backward iteration (cursor-based access) rather
> than random-access indexing.
Hmmm, sorry for the irrelevant link - my lack of expertise here is
showing. What I do know is that we (meaning Jim Baker) are taking
great pains to always use codepoints even for random access in our
unicode code. I can't speak to the performance implications without
some deeper study into what Jim has done.

-Frank

From solipsis at pitrou.net  Fri Sep  9 19:04:32 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 9 Sep 2011 19:04:32 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es>
	<4E65AD58.6050106@jcea.es>
	<CADiSq7d5SQ2GOTR0vYsNDOB_9Y2DNW4P69LFZcaG0Tkf74U0AQ@mail.gmail.com>
	<4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net>
	<4E681681.6060405@jcea.es> <20110908091805.3f1e9141@pitrou.net>
	<4E6A2D4A.30503@jcea.es>
Message-ID: <20110909190432.11206d07@msiwind>

Le Fri, 09 Sep 2011 17:14:18 +0200,
Jesus Cea <jcea at jcea.es> a ?crit :
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 08/09/11 09:18, Antoine Pitrou wrote:
> > Ok, I've added "-j4", let's how that works.
> 
> It is not helping. it is taking tons of memory yet.

That's rather strange. Is it for every test or a few select ones?

> >> Another option would be to have a single Python process and 
> >> "fork" for each test. That would launch each test in a separate 
> >> process without requiring a full python interpreter launching 
> >> each time. Is this the way "-j" is implemented
> > 
> > It uses subprocess actually, so fork() + exec() is used.
> 
> Yes, does it but fork for each test or simply launch 4 processes, each
> doing 1/4 of the tests?.

It forks for each test.

Regards

Antoine.


From tjreedy at udel.edu  Fri Sep  9 19:16:17 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 09 Sep 2011 13:16:17 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CADrh4zLNFfcXVqO7SvD1=3GvC_DgFNEkJ9=jBkkJRW=nnE7BLQ@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<201108262337.42349.victor.stinner@haypocalc.com>
	<CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
	<CADrh4zJZC1ZC0HegPEh5ThkDB8jgF23oJmmOr1bFpqKHdu4_mw@mail.gmail.com>
	<CADrh4z+U8uQrxtQ4Nq=qBB7_U3yzZQvLKpqiRb3r8W_nZfLQLA@mail.gmail.com>
	<j4c8s1$473$1@dough.gmane.org>
	<CADrh4zLNFfcXVqO7SvD1=3GvC_DgFNEkJ9=jBkkJRW=nnE7BLQ@mail.gmail.com>
Message-ID: <j4dhmp$tio$1@dough.gmane.org>

On 9/9/2011 12:12 PM, fwierzbicki at gmail.com wrote:
> On Thu, Sep 8, 2011 at 10:39 PM, Terry Reedy<tjreedy at udel.edu>  wrote:
>> On 9/8/2011 6:15 PM, fwierzbicki at gmail.com wrote:
>>>
>>> Oops, forgot to add the link for the gory details for Java and>    2 byte
>>> unicode:
>>>
>>> http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
>>
>> This is dated 2004. Basically, they considered several options, tried out 4,
>> and ended up sticking with char[] (sequences) as UTF-16 with char = 16 bit
>> code unit and added 32-bit Character(int) class for low-level manipulation
>> of code points.
>>
>> I did not see the indexing problem mentioned. I get the impression that they
>> encourage sequence forward-backward iteration (cursor-based access) rather
>> than random-access indexing.
> Hmmm, sorry for the irrelevant link - my lack of expertise here is
> showing. What I do know is that we (meaning Jim Baker) are taking
> great pains to always use codepoints even for random access in our
> unicode code. I can't speak to the performance implications without
> some deeper study into what Jim has done.

I am curious how you index by code point rather than code unit with 
16-bit code units and how it compares with the method I posted. Is there 
anything I can read? Reply off list if you want.

-- 
Terry Jan Reedy


From g.brandl at gmx.net  Fri Sep  9 21:27:11 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 09 Sep 2011 21:27:11 +0200
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Fix
 PyUnicode_AsWideCharString() doc: size doesn't contain the null character
In-Reply-To: <4E65D429.3040908@haypocalc.com>
References: <E1R0j70-0003rc-8g@dinsdale.python.org>
	<CADiSq7eqm2CXKVeZpwo5+X3vm07+a+xGuQE-65nvL2+8t8gaDA@mail.gmail.com>
	<4E65D429.3040908@haypocalc.com>
Message-ID: <j4dp7s$k7i$2@dough.gmane.org>

Am 06.09.2011 10:04, schrieb Victor Stinner:
> Le 06/09/2011 02:25, Nick Coghlan a ?crit :
>> On Tue, Sep 6, 2011 at 10:01 AM, victor.stinner
>> <python-checkins at python.org>  wrote:
>>> Fix also spelling of the null character.
>>
>> While these cases are legitimately changed to 'null' (since they're
>> lowercase descriptions of the character), I figure it's worth
>> mentioning again that the ASCII name for '\0' actually *is* NUL (i.e.
>> only one 'L'). Strange, but true [1].
>>
>> Cheers,
>> Nick.
>>
>> [1] https://secure.wikimedia.org/wikipedia/en/wiki/ASCII
> 
> "NUL" is an abbreviation used in tables when you don't have enough space 
> to write the full name: "null character".
> 
> Where do you want to mention this abbreviation?

I vote to paint the bikeshed BLU.

Georg

(Seriously, how many more messages will this triviality spawn?)


From fwierzbicki at gmail.com  Fri Sep  9 21:58:41 2011
From: fwierzbicki at gmail.com (fwierzbicki at gmail.com)
Date: Fri, 9 Sep 2011 12:58:41 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <j4dhmp$tio$1@dough.gmane.org>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<201108262337.42349.victor.stinner@haypocalc.com>
	<CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
	<CADrh4zJZC1ZC0HegPEh5ThkDB8jgF23oJmmOr1bFpqKHdu4_mw@mail.gmail.com>
	<CADrh4z+U8uQrxtQ4Nq=qBB7_U3yzZQvLKpqiRb3r8W_nZfLQLA@mail.gmail.com>
	<j4c8s1$473$1@dough.gmane.org>
	<CADrh4zLNFfcXVqO7SvD1=3GvC_DgFNEkJ9=jBkkJRW=nnE7BLQ@mail.gmail.com>
	<j4dhmp$tio$1@dough.gmane.org>
Message-ID: <CADrh4zJxpdya+u2gbTX7Q5V7FSRDv-XF73kH6fkVUL1zwzh0eg@mail.gmail.com>

On Fri, Sep 9, 2011 at 10:16 AM, Terry Reedy <tjreedy at udel.edu> wrote:

> I am curious how you index by code point rather than code unit with 16-bit
> code units and how it compares with the method I posted. Is there anything I
> can read? Reply off list if you want.
I'll post on-list until someone complains, just in case there are
interested onlookers :)

There aren't docs, but the code is here:
https://bitbucket.org/jython/jython/src/8a8642e45433/src/org/python/core/PyUnicode.java

Here are (I think) the most relevant bits for random access -- note
that getString() returns the internal representation of the PyUnicode
which is a java.lang.String

    @Override
    protected PyObject pyget(int i) {
        if (isBasicPlane()) {
            return Py.makeCharacter(getString().charAt(i), true);
        }

        int k = 0;
        while (i > 0) {
            int W1 = getString().charAt(k);
            if (W1 >= 0xD800 && W1 < 0xDC00) {
                k += 2;
            } else {
                k += 1;
            }
            i--;
        }
        int codepoint = getString().codePointAt(k);
        return Py.makeCharacter(codepoint, true);
    }

    public boolean isBasicPlane() {
        if (plane == Plane.BASIC) {
            return true;
        } else if (plane == Plane.UNKNOWN) {
            plane = (getString().length() == getCodePointCount()) ?
Plane.BASIC : Plane.ASTRAL;
        }
        return plane == Plane.BASIC;
    }

    public int getCodePointCount() {
        if (codePointCount >= 0) {
            return codePointCount;
        }
        codePointCount = getString().codePointCount(0, getString().length());
        return codePointCount;
    }

-Frank

From guido at python.org  Fri Sep  9 23:21:33 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 9 Sep 2011 14:21:33 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CADrh4zJxpdya+u2gbTX7Q5V7FSRDv-XF73kH6fkVUL1zwzh0eg@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<201108262337.42349.victor.stinner@haypocalc.com>
	<CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
	<CADrh4zJZC1ZC0HegPEh5ThkDB8jgF23oJmmOr1bFpqKHdu4_mw@mail.gmail.com>
	<CADrh4z+U8uQrxtQ4Nq=qBB7_U3yzZQvLKpqiRb3r8W_nZfLQLA@mail.gmail.com>
	<j4c8s1$473$1@dough.gmane.org>
	<CADrh4zLNFfcXVqO7SvD1=3GvC_DgFNEkJ9=jBkkJRW=nnE7BLQ@mail.gmail.com>
	<j4dhmp$tio$1@dough.gmane.org>
	<CADrh4zJxpdya+u2gbTX7Q5V7FSRDv-XF73kH6fkVUL1zwzh0eg@mail.gmail.com>
Message-ID: <CAP7+vJLQNZr226NrvgzeJ9M-a2ECXNUFeLALtsRBGkZdp9eHcA@mail.gmail.com>

I, for one, am very interested. It sounds like the 'unicode' datatype
in Jython does not in fact have O(1) indexing characteristics if the
string contains any characters in the astral plane. Interesting. I
wonder if you have heard from anyone about this affecting their app's
performance?

--Guido

On Fri, Sep 9, 2011 at 12:58 PM, fwierzbicki at gmail.com
<fwierzbicki at gmail.com> wrote:
> On Fri, Sep 9, 2011 at 10:16 AM, Terry Reedy <tjreedy at udel.edu> wrote:
>
>> I am curious how you index by code point rather than code unit with 16-bit
>> code units and how it compares with the method I posted. Is there anything I
>> can read? Reply off list if you want.
> I'll post on-list until someone complains, just in case there are
> interested onlookers :)
>
> There aren't docs, but the code is here:
> https://bitbucket.org/jython/jython/src/8a8642e45433/src/org/python/core/PyUnicode.java
>
> Here are (I think) the most relevant bits for random access -- note
> that getString() returns the internal representation of the PyUnicode
> which is a java.lang.String
>
> ? ?@Override
> ? ?protected PyObject pyget(int i) {
> ? ? ? ?if (isBasicPlane()) {
> ? ? ? ? ? ?return Py.makeCharacter(getString().charAt(i), true);
> ? ? ? ?}
>
> ? ? ? ?int k = 0;
> ? ? ? ?while (i > 0) {
> ? ? ? ? ? ?int W1 = getString().charAt(k);
> ? ? ? ? ? ?if (W1 >= 0xD800 && W1 < 0xDC00) {
> ? ? ? ? ? ? ? ?k += 2;
> ? ? ? ? ? ?} else {
> ? ? ? ? ? ? ? ?k += 1;
> ? ? ? ? ? ?}
> ? ? ? ? ? ?i--;
> ? ? ? ?}
> ? ? ? ?int codepoint = getString().codePointAt(k);
> ? ? ? ?return Py.makeCharacter(codepoint, true);
> ? ?}
>
> ? ?public boolean isBasicPlane() {
> ? ? ? ?if (plane == Plane.BASIC) {
> ? ? ? ? ? ?return true;
> ? ? ? ?} else if (plane == Plane.UNKNOWN) {
> ? ? ? ? ? ?plane = (getString().length() == getCodePointCount()) ?
> Plane.BASIC : Plane.ASTRAL;
> ? ? ? ?}
> ? ? ? ?return plane == Plane.BASIC;
> ? ?}
>
> ? ?public int getCodePointCount() {
> ? ? ? ?if (codePointCount >= 0) {
> ? ? ? ? ? ?return codePointCount;
> ? ? ? ?}
> ? ? ? ?codePointCount = getString().codePointCount(0, getString().length());
> ? ? ? ?return codePointCount;
> ? ?}
>
> -Frank
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (python.org/~guido)

From fwierzbicki at gmail.com  Sat Sep 10 00:38:03 2011
From: fwierzbicki at gmail.com (fwierzbicki at gmail.com)
Date: Fri, 9 Sep 2011 15:38:03 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJLQNZr226NrvgzeJ9M-a2ECXNUFeLALtsRBGkZdp9eHcA@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<201108262337.42349.victor.stinner@haypocalc.com>
	<CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
	<CADrh4zJZC1ZC0HegPEh5ThkDB8jgF23oJmmOr1bFpqKHdu4_mw@mail.gmail.com>
	<CADrh4z+U8uQrxtQ4Nq=qBB7_U3yzZQvLKpqiRb3r8W_nZfLQLA@mail.gmail.com>
	<j4c8s1$473$1@dough.gmane.org>
	<CADrh4zLNFfcXVqO7SvD1=3GvC_DgFNEkJ9=jBkkJRW=nnE7BLQ@mail.gmail.com>
	<j4dhmp$tio$1@dough.gmane.org>
	<CADrh4zJxpdya+u2gbTX7Q5V7FSRDv-XF73kH6fkVUL1zwzh0eg@mail.gmail.com>
	<CAP7+vJLQNZr226NrvgzeJ9M-a2ECXNUFeLALtsRBGkZdp9eHcA@mail.gmail.com>
Message-ID: <CADrh4z+UZJVeMyjYOwMDqe11U0+CQ7LNFoPzaoYSUUMreJ5eZg@mail.gmail.com>

On Fri, Sep 9, 2011 at 2:21 PM, Guido van Rossum <guido at python.org> wrote:
> I, for one, am very interested. It sounds like the 'unicode' datatype
> in Jython does not in fact have O(1) indexing characteristics if the
> string contains any characters in the astral plane. Interesting. I
> wonder if you have heard from anyone about this affecting their app's
> performance?
So far we haven't had any complaints - I'm not really sure how often
Jython gets used with astral plane characters at this point, but I
expect it will happen more in the future, especially once we put
together a Jython 3 and Unicode support becomes a stronger
expectation. Personally I'm hoping that in that time frame Java will
come under pressure to provide a better answer (or we may need to
think in the same direction as Dino was thinking in an earlier part of
this thread and make a more Python specific String type for
Jython....)

-Frank

From guido at python.org  Sat Sep 10 00:43:31 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 9 Sep 2011 15:43:31 -0700
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CADrh4z+UZJVeMyjYOwMDqe11U0+CQ7LNFoPzaoYSUUMreJ5eZg@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<201108262337.42349.victor.stinner@haypocalc.com>
	<CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
	<CADrh4zJZC1ZC0HegPEh5ThkDB8jgF23oJmmOr1bFpqKHdu4_mw@mail.gmail.com>
	<CADrh4z+U8uQrxtQ4Nq=qBB7_U3yzZQvLKpqiRb3r8W_nZfLQLA@mail.gmail.com>
	<j4c8s1$473$1@dough.gmane.org>
	<CADrh4zLNFfcXVqO7SvD1=3GvC_DgFNEkJ9=jBkkJRW=nnE7BLQ@mail.gmail.com>
	<j4dhmp$tio$1@dough.gmane.org>
	<CADrh4zJxpdya+u2gbTX7Q5V7FSRDv-XF73kH6fkVUL1zwzh0eg@mail.gmail.com>
	<CAP7+vJLQNZr226NrvgzeJ9M-a2ECXNUFeLALtsRBGkZdp9eHcA@mail.gmail.com>
	<CADrh4z+UZJVeMyjYOwMDqe11U0+CQ7LNFoPzaoYSUUMreJ5eZg@mail.gmail.com>
Message-ID: <CAP7+vJKOCsx2pOO_2cqYgCQQuxerNGmoDbz01jUOXk4FmBHqPA@mail.gmail.com>

Well, I'd be interesting how it goes, since if Jython users find this
acceptable then maybe we shouldn't be quite so concerned about it for
CPython... On the third hand we don't have working code for this
approach in CPython, while we do have working code for the PEP 393
solution...

--Guido

On Fri, Sep 9, 2011 at 3:38 PM, fwierzbicki at gmail.com
<fwierzbicki at gmail.com> wrote:
> On Fri, Sep 9, 2011 at 2:21 PM, Guido van Rossum <guido at python.org> wrote:
>> I, for one, am very interested. It sounds like the 'unicode' datatype
>> in Jython does not in fact have O(1) indexing characteristics if the
>> string contains any characters in the astral plane. Interesting. I
>> wonder if you have heard from anyone about this affecting their app's
>> performance?
> So far we haven't had any complaints - I'm not really sure how often
> Jython gets used with astral plane characters at this point, but I
> expect it will happen more in the future, especially once we put
> together a Jython 3 and Unicode support becomes a stronger
> expectation. Personally I'm hoping that in that time frame Java will
> come under pressure to provide a better answer (or we may need to
> think in the same direction as Dino was thinking in an earlier part of
> this thread and make a more Python specific String type for
> Jython....)
>
> -Frank
>


-- 
--Guido van Rossum (python.org/~guido)

From tjreedy at udel.edu  Sat Sep 10 03:11:18 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 09 Sep 2011 21:11:18 -0400
Subject: [Python-Dev] PEP 393 Summer of Code Project
In-Reply-To: <CAP7+vJLQNZr226NrvgzeJ9M-a2ECXNUFeLALtsRBGkZdp9eHcA@mail.gmail.com>
References: <CAP_a28F4sDBuDkhrQseVmRLgvu8ReEWcYVUCW8fduf17YeqPpg@mail.gmail.com>
	<CAP7+vJJx5-5wU+BJ8-zR_sVXAGPqGyDTq_h4KKaRRDy7TU0HaA@mail.gmail.com>
	<6C7ABA8B4E309440B857D74348836F2E28F378E2@TK5EX14MBXC292.redmond.corp.microsoft.com>
	<201108262337.42349.victor.stinner@haypocalc.com>
	<CAP7+vJKdsXca+tKQtZc-L=eYzEmfHT7mVC43snLnKWRE0mmF-A@mail.gmail.com>
	<CADrh4zJZC1ZC0HegPEh5ThkDB8jgF23oJmmOr1bFpqKHdu4_mw@mail.gmail.com>
	<CADrh4z+U8uQrxtQ4Nq=qBB7_U3yzZQvLKpqiRb3r8W_nZfLQLA@mail.gmail.com>
	<j4c8s1$473$1@dough.gmane.org>
	<CADrh4zLNFfcXVqO7SvD1=3GvC_DgFNEkJ9=jBkkJRW=nnE7BLQ@mail.gmail.com>
	<j4dhmp$tio$1@dough.gmane.org>
	<CADrh4zJxpdya+u2gbTX7Q5V7FSRDv-XF73kH6fkVUL1zwzh0eg@mail.gmail.com>
	<CAP7+vJLQNZr226NrvgzeJ9M-a2ECXNUFeLALtsRBGkZdp9eHcA@mail.gmail.com>
Message-ID: <j4edhf$gei$1@dough.gmane.org>

On 9/9/2011 5:21 PM, Guido van Rossum wrote:
> I, for one, am very interested. It sounds like the 'unicode' datatype
> in Jython does not in fact have O(1) indexing characteristics if the
> string contains any characters in the astral plane. Interesting. I
> wonder if you have heard from anyone about this affecting their app's
> performance?
>
> --Guido

The question is whether or how often any Jython users are yet 
indexing/slicing long strings with astral chars. If a utf-8 xml file is 
directly parsed into a DOM, then the longest decoded strings will be 
'paragraphs' that are seldom more than 1000 chars.

> On Fri, Sep 9, 2011 at 12:58 PM, fwierzbicki at gmail.com
> <fwierzbicki at gmail.com>  wrote:
>> On Fri, Sep 9, 2011 at 10:16 AM, Terry Reedy<tjreedy at udel.edu>  wrote:
>>
>>> I am curious how you index by code point rather than code unit with 16-bit
>>> code units and how it compares with the method I posted. Is there anything I
>>> can read? Reply off list if you want.
>> I'll post on-list until someone complains, just in case there are
>> interested onlookers :)
>>
>> There aren't docs, but the code is here:
>> https://bitbucket.org/jython/jython/src/8a8642e45433/src/org/python/core/PyUnicode.java
>>
>> Here are (I think) the most relevant bits for random access -- note
>> that getString() returns the internal representation of the PyUnicode
>> which is a java.lang.String
>>
>>     @Override
>>     protected PyObject pyget(int i) {
>>         if (isBasicPlane()) {
>>             return Py.makeCharacter(getString().charAt(i), true);
>>         }

This is O(1)

>>         int k = 0;
>>         while (i>  0) {
>>             int W1 = getString().charAt(k);
>>             if (W1>= 0xD800&&  W1<  0xDC00) {
>>                 k += 2;
>>             } else {
>>                 k += 1;
>>             }
>>             i--;

This is an O(n) linear scan.

>>         }
>>         int codepoint = getString().codePointAt(k);
>>         return Py.makeCharacter(codepoint, true);
>>     }

Near the beginning of this thread, I described and gave a link to my 
O(logk) algorithm, where k is the number of supplementary ('astral') 
chars. It uses bisect.bisect_left on an int array of length k 
constructed with a linear scan much like the one above, with one added 
line. The basic idea is to do the linear scan just once and save the 
locations (code point indexes) of the astral chars instead of repeating 
the scan on every access. That could be done as the string is 
constructed. The same array search works for slicing too. Jython is 
welcome to use it if you ever decide you need it.

I have in mind to someday do some timing tests with the Python version. 
I just do not know how closely results would be to those for compiled C 
or Java.

-- 
Terry Jan Reedy


From jcea at jcea.es  Sat Sep 10 05:02:09 2011
From: jcea at jcea.es (Jesus Cea)
Date: Sat, 10 Sep 2011 05:02:09 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <20110909190432.11206d07@msiwind>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es>
	<4E65AD58.6050106@jcea.es>
	<CADiSq7d5SQ2GOTR0vYsNDOB_9Y2DNW4P69LFZcaG0Tkf74U0AQ@mail.gmail.com>
	<4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net>
	<4E681681.6060405@jcea.es> <20110908091805.3f1e9141@pitrou.net>
	<4E6A2D4A.30503@jcea.es> <20110909190432.11206d07@msiwind>
Message-ID: <4E6AD331.3080302@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/09/11 19:04, Antoine Pitrou wrote:
>> On 08/09/11 09:18, Antoine Pitrou wrote:
>>> Ok, I've added "-j4", let's how that works.
>> 
>> It is not helping. it is taking tons of memory yet.
> 
> That's rather strange. Is it for every test or a few select ones?

I can't reproduce after stopping the buildbots, delete all its data
and restart them. Now I see quite a few python processes running, but
memory usage is reasonable.

>> Yes, does it but fork for each test or simply launch 4 processes,
>> each doing 1/4 of the tests?.
> 
> It forks for each test.

So, the memory used should be quite low, then :-).

I have committed a few patches in the last hours to get my buildbots
"green", back again. The memory used was <500MB, compared with >4GB
before the "-j".

Could you reconfigure my buildbots to be able to run all the six (2.7,
3.2, 3.x, in 32 and 64 bits) instances at the same time, again?. I
have enough resources now. I really sorry to waste your time...

Thanks!!!!!.

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmrTMZlgi5GaxT1NAQIIKgP+LE1NCfcCVIX+jau4QSJRAVvZan4rqqYn
/tMLaz92/toP2S8FdHKbEPs6hBf6QGgnVxnHWcwTxxTWzfDL8xxGjFgJYh/hcqBi
B2zfrp83PjW6hFMeL6E7707DI6YwZRCB+dJIiVejAIEMHVOVG6x12KRLFCWL+AOZ
ElpXewoATXI=
=fHkz
-----END PGP SIGNATURE-----

From jcea at jcea.es  Sat Sep 10 05:26:34 2011
From: jcea at jcea.es (Jesus Cea)
Date: Sat, 10 Sep 2011 05:26:34 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
In-Reply-To: <4E6AD331.3080302@jcea.es>
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es>
	<4E65AD58.6050106@jcea.es>
	<CADiSq7d5SQ2GOTR0vYsNDOB_9Y2DNW4P69LFZcaG0Tkf74U0AQ@mail.gmail.com>
	<4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net>
	<4E681681.6060405@jcea.es> <20110908091805.3f1e9141@pitrou.net>
	<4E6A2D4A.30503@jcea.es> <20110909190432.11206d07@msiwind>
	<4E6AD331.3080302@jcea.es>
Message-ID: <4E6AD8EA.9010901@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/09/11 05:02, Jesus Cea wrote:
> I have committed a few patches in the last hours to get my
> buildbots "green", back again. The memory used was <500MB, compared
> with >4GB before the "-j".

One of my patches solves a "process leak" in multiprocessing, when
some tests failed. Doing "make test" leaked quite a few processes, but
only in OpenIndiana, where those tests actually failed. That is solved
now, both the leak and the test failure.

Details:

http://bugs.python.org/issue12948
http://bugs.python.org/issue12950

I think the buildbots toke care of this rogue processes after the
timeout expires, anyway, but...

> Could you reconfigure my buildbots to be able to run all the six
> (2.7, 3.2, 3.x, in 32 and 64 bits) instances at the same time,
> again?. I have enough resources now. I really sorry to waste your
> time...

Now, a buildbot run of 3.x compiled in 64bits takes around 500MB. I
have seen a peak of around 4GB and a few of around 800MB, for a
fraction of a second.

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTmrY6plgi5GaxT1NAQKvUgP/YlS7wneU5dsWoAmtqauC02gZUi1D4OpQ
7waM8G1q8OHXLbpV1jKmBb/32G+rDp1Tm/XCjlHpK1wJcmwWmdPGAbbQp1o5TduJ
z+lbPnzWvMCRLJwZDtZAitn4/7VchoAcdTfIYCyBoK/JEUI1Oq0Mt5XeIgtD+FX9
IjwuWzXISqM=
=ojrq
-----END PGP SIGNATURE-----

From solipsis at pitrou.net  Sat Sep 10 18:46:16 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 10 Sep 2011 18:46:16 +0200
Subject: [Python-Dev] Multigigabyte memory usage in the OpenIndiana
	Buildbot
References: <4E60FCD6.3090005@jcea.es> <4E611821.9050108@jcea.es>
	<20110902201415.773da7d6@pitrou.net> <4E659CDD.8090900@jcea.es>
	<CADiSq7e0Wd+c5Dj64Z3AMGe9=7RYM=4TnWpbDJ=2h458tHaSYQ@mail.gmail.com>
	<4E65A13D.9010805@jcea.es>
	<CADiSq7d5V69LFL7kqnrRk8rXc5eQ0mp7bPGFTf5MV5qaev-Rqw@mail.gmail.com>
	<4E65A8AE.10900@jcea.es> <4E65A955.7000507@jcea.es>
	<4E65AD58.6050106@jcea.es>
	<CADiSq7d5SQ2GOTR0vYsNDOB_9Y2DNW4P69LFZcaG0Tkf74U0AQ@mail.gmail.com>
	<4E6757AF.4050007@jcea.es> <20110907143259.2bcff454@pitrou.net>
	<4E681681.6060405@jcea.es> <20110908091805.3f1e9141@pitrou.net>
	<4E6A2D4A.30503@jcea.es> <20110909190432.11206d07@msiwind>
	<4E6AD331.3080302@jcea.es>
Message-ID: <20110910184616.3efc654e@msiwind>

Le Sat, 10 Sep 2011 05:02:09 +0200,
Jesus Cea <jcea at jcea.es> a ?crit :
> 
> I have committed a few patches in the last hours to get my buildbots
> "green", back again. The memory used was <500MB, compared with >4GB
> before the "-j".
> 
> Could you reconfigure my buildbots to be able to run all the six (2.7,
> 3.2, 3.x, in 32 and 64 bits) instances at the same time, again?. I
> have enough resources now. I really sorry to waste your time...

I don't think I can do it right now, since I'm away on holiday. However,
perhaps David or Martin can do it. Or you'll have to wait a couple of
weeks :)

Regards

Antoine.


From howard_b_golden at yahoo.com  Wed Sep  7 21:33:11 2011
From: howard_b_golden at yahoo.com (Howard B. Golden)
Date: Wed, 07 Sep 2011 12:33:11 -0700
Subject: [Python-Dev] Handling linker scripts reached when dynamically
	loading a module
Message-ID: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com>

Hi,

In Haskell I experienced a situation where dynamically loaded modules
were experiencing "invalid ELF header" errors. This was caused by the
module names actually referring to linker scripts rather than ELF
binaries. I patched the GHC runtime system to deal with these scripts.

I noticed that this same patch has been ported to Ruby and Node.js, so I
suggested to the libc developers that they might wish to incorporate the
patch into their library, making it available to all languages. They
rejected this suggestion, so I am making the suggestion to the Python
devs in case it is of interest to you.

Basically, when a linker script is loaded by dlopen, an "invalid ELF
header" error occurs. The patch checks to see if the file is a linker
script. If so, it finds the name of the real ELF binary with a regular
expression and tries to dlopen it. If successful, processing proceeds.
Otherwise, the original "invalid ELF error" message is returned.

If you want to add this code to Python, you can look at my original
patch (http://hackage.haskell.org/trac/ghc/ticket/2615) or the Ruby
version (https://github.com/ffi/ffi/pull/117) or the Node.js version
(https://github.com/rbranson/node-ffi/pull/5) to help port it.

Note that the GHC version in GHC 7.2.1 has been enhanced to also handle
another possible error when the linker script is too short, so you might
also want to add this enhancement also (see
https://github.com/ghc/blob/master/rts/Linker.c line 1191 for the
revised regular expression):

"(([^ \t()])+\\.so([^ \t:()])*):([ \t])*(invalid ELF header|file too
short)"

At this point, I don't have the free time to write the Python patch
myself, so I apologize in advance for not providing it to you.

HTH,

Howard B. Golden
Northridge, California, USA


From guido at python.org  Sat Sep 10 23:39:15 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 10 Sep 2011 14:39:15 -0700
Subject: [Python-Dev] Handling linker scripts reached when dynamically
 loading a module
In-Reply-To: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com>
References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com>
Message-ID: <CAP7+vJLq=6-RYCatdCQsY-VHoszXYEe650+93YP1kWc+xhnkGw@mail.gmail.com>

Excuse me for asking a newbie question, but what are linker scripts
and why are they important? I don't recall anyone ever having
requested this feature before.

--Guido

On Wed, Sep 7, 2011 at 12:33 PM, Howard B. Golden
<howard_b_golden at yahoo.com> wrote:
> Hi,
>
> In Haskell I experienced a situation where dynamically loaded modules
> were experiencing "invalid ELF header" errors. This was caused by the
> module names actually referring to linker scripts rather than ELF
> binaries. I patched the GHC runtime system to deal with these scripts.
>
> I noticed that this same patch has been ported to Ruby and Node.js, so I
> suggested to the libc developers that they might wish to incorporate the
> patch into their library, making it available to all languages. They
> rejected this suggestion, so I am making the suggestion to the Python
> devs in case it is of interest to you.
>
> Basically, when a linker script is loaded by dlopen, an "invalid ELF
> header" error occurs. The patch checks to see if the file is a linker
> script. If so, it finds the name of the real ELF binary with a regular
> expression and tries to dlopen it. If successful, processing proceeds.
> Otherwise, the original "invalid ELF error" message is returned.
>
> If you want to add this code to Python, you can look at my original
> patch (http://hackage.haskell.org/trac/ghc/ticket/2615) or the Ruby
> version (https://github.com/ffi/ffi/pull/117) or the Node.js version
> (https://github.com/rbranson/node-ffi/pull/5) to help port it.
>
> Note that the GHC version in GHC 7.2.1 has been enhanced to also handle
> another possible error when the linker script is too short, so you might
> also want to add this enhancement also (see
> https://github.com/ghc/blob/master/rts/Linker.c line 1191 for the
> revised regular expression):
>
> "(([^ \t()])+\\.so([^ \t:()])*):([ \t])*(invalid ELF header|file too
> short)"
>
> At this point, I don't have the free time to write the Python patch
> myself, so I apologize in advance for not providing it to you.
>
> HTH,
>
> Howard B. Golden
> Northridge, California, USA
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (python.org/~guido)

From howard_b_golden at yahoo.com  Sat Sep 10 23:50:42 2011
From: howard_b_golden at yahoo.com (Howard B. Golden)
Date: Sat, 10 Sep 2011 14:50:42 -0700
Subject: [Python-Dev] Handling linker scripts reached when dynamically
 loading a module
In-Reply-To: <CAP7+vJLq=6-RYCatdCQsY-VHoszXYEe650+93YP1kWc+xhnkGw@mail.gmail.com>
References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com>
	<CAP7+vJLq=6-RYCatdCQsY-VHoszXYEe650+93YP1kWc+xhnkGw@mail.gmail.com>
Message-ID: <1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com>

I don't know why, but some Linux distributions place scripts into .so
files instead of the actual binaries. This takes advantage of a feature
of GNU ld that it will process the script (which points to the actual
binary) when it links the .so file.

This feature works fine when you are linking a binary, but it doesn't
take into account that binaries can be loaded dynamically by
interpreters (e.g., Python or GHCi). If dlopen finds a linker script, it
doesn't know what to do with it. It simply diagnoses the file as either
an invalid ELF header or too short.

On Gentoo Linux, some common libraries that are represented as linker
scripts include libm.so, libpthread.so and libpcre.so. I know this also
affects Ubuntu.

Howard

On Sat, 2011-09-10 at 14:39 -0700, Guido van Rossum wrote:
> Excuse me for asking a newbie question, but what are linker scripts
> and why are they important? I don't recall anyone ever having
> requested this feature before.
> 
> --Guido
> 
> On Wed, Sep 7, 2011 at 12:33 PM, Howard B. Golden
> <howard_b_golden at yahoo.com> wrote:
> > Hi,
> >
> > In Haskell I experienced a situation where dynamically loaded modules
> > were experiencing "invalid ELF header" errors. This was caused by the
> > module names actually referring to linker scripts rather than ELF
> > binaries. I patched the GHC runtime system to deal with these scripts.
> >
> > I noticed that this same patch has been ported to Ruby and Node.js, so I
> > suggested to the libc developers that they might wish to incorporate the
> > patch into their library, making it available to all languages. They
> > rejected this suggestion, so I am making the suggestion to the Python
> > devs in case it is of interest to you.
> >
> > Basically, when a linker script is loaded by dlopen, an "invalid ELF
> > header" error occurs. The patch checks to see if the file is a linker
> > script. If so, it finds the name of the real ELF binary with a regular
> > expression and tries to dlopen it. If successful, processing proceeds.
> > Otherwise, the original "invalid ELF error" message is returned.
> >
> > If you want to add this code to Python, you can look at my original
> > patch (http://hackage.haskell.org/trac/ghc/ticket/2615) or the Ruby
> > version (https://github.com/ffi/ffi/pull/117) or the Node.js version
> > (https://github.com/rbranson/node-ffi/pull/5) to help port it.
> >
> > Note that the GHC version in GHC 7.2.1 has been enhanced to also handle
> > another possible error when the linker script is too short, so you might
> > also want to add this enhancement also (see
> > https://github.com/ghc/blob/master/rts/Linker.c line 1191 for the
> > revised regular expression):
> >
> > "(([^ \t()])+\\.so([^ \t:()])*):([ \t])*(invalid ELF header|file too
> > short)"
> >
> > At this point, I don't have the free time to write the Python patch
> > myself, so I apologize in advance for not providing it to you.
> >
> > HTH,
> >
> > Howard B. Golden
> > Northridge, California, USA
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > http://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
> >
> 
> 
> 


From guido at python.org  Sun Sep 11 00:24:04 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 10 Sep 2011 15:24:04 -0700
Subject: [Python-Dev] Handling linker scripts reached when dynamically
 loading a module
In-Reply-To: <1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com>
References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com>
	<CAP7+vJLq=6-RYCatdCQsY-VHoszXYEe650+93YP1kWc+xhnkGw@mail.gmail.com>
	<1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com>
Message-ID: <CAP7+vJJUrhiRhWgQXN7EepN6KSu0o4_CFvGwwZMzVxxkWBbU=A@mail.gmail.com>

Odd. Let's see what other core devs say.

On Sat, Sep 10, 2011 at 2:50 PM, Howard B. Golden
<howard_b_golden at yahoo.com> wrote:
> I don't know why, but some Linux distributions place scripts into .so
> files instead of the actual binaries. This takes advantage of a feature
> of GNU ld that it will process the script (which points to the actual
> binary) when it links the .so file.
>
> This feature works fine when you are linking a binary, but it doesn't
> take into account that binaries can be loaded dynamically by
> interpreters (e.g., Python or GHCi). If dlopen finds a linker script, it
> doesn't know what to do with it. It simply diagnoses the file as either
> an invalid ELF header or too short.
>
> On Gentoo Linux, some common libraries that are represented as linker
> scripts include libm.so, libpthread.so and libpcre.so. I know this also
> affects Ubuntu.
>
> Howard
>
> On Sat, 2011-09-10 at 14:39 -0700, Guido van Rossum wrote:
>> Excuse me for asking a newbie question, but what are linker scripts
>> and why are they important? I don't recall anyone ever having
>> requested this feature before.
>>
>> --Guido
>>
>> On Wed, Sep 7, 2011 at 12:33 PM, Howard B. Golden
>> <howard_b_golden at yahoo.com> wrote:
>> > Hi,
>> >
>> > In Haskell I experienced a situation where dynamically loaded modules
>> > were experiencing "invalid ELF header" errors. This was caused by the
>> > module names actually referring to linker scripts rather than ELF
>> > binaries. I patched the GHC runtime system to deal with these scripts.
>> >
>> > I noticed that this same patch has been ported to Ruby and Node.js, so I
>> > suggested to the libc developers that they might wish to incorporate the
>> > patch into their library, making it available to all languages. They
>> > rejected this suggestion, so I am making the suggestion to the Python
>> > devs in case it is of interest to you.
>> >
>> > Basically, when a linker script is loaded by dlopen, an "invalid ELF
>> > header" error occurs. The patch checks to see if the file is a linker
>> > script. If so, it finds the name of the real ELF binary with a regular
>> > expression and tries to dlopen it. If successful, processing proceeds.
>> > Otherwise, the original "invalid ELF error" message is returned.
>> >
>> > If you want to add this code to Python, you can look at my original
>> > patch (http://hackage.haskell.org/trac/ghc/ticket/2615) or the Ruby
>> > version (https://github.com/ffi/ffi/pull/117) or the Node.js version
>> > (https://github.com/rbranson/node-ffi/pull/5) to help port it.
>> >
>> > Note that the GHC version in GHC 7.2.1 has been enhanced to also handle
>> > another possible error when the linker script is too short, so you might
>> > also want to add this enhancement also (see
>> > https://github.com/ghc/blob/master/rts/Linker.c line 1191 for the
>> > revised regular expression):
>> >
>> > "(([^ \t()])+\\.so([^ \t:()])*):([ \t])*(invalid ELF header|file too
>> > short)"
>> >
>> > At this point, I don't have the free time to write the Python patch
>> > myself, so I apologize in advance for not providing it to you.
>> >
>> > HTH,
>> >
>> > Howard B. Golden
>> > Northridge, California, USA
>> >
>> > _______________________________________________
>> > Python-Dev mailing list
>> > Python-Dev at python.org
>> > http://mail.python.org/mailman/listinfo/python-dev
>> > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>> >
>>
>>
>>
>
>
>


-- 
--Guido van Rossum (python.org/~guido)

From nadeem.vawda at gmail.com  Sun Sep 11 00:39:02 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Sun, 11 Sep 2011 00:39:02 +0200
Subject: [Python-Dev] Handling linker scripts reached when dynamically
 loading a module
In-Reply-To: <CAP7+vJJUrhiRhWgQXN7EepN6KSu0o4_CFvGwwZMzVxxkWBbU=A@mail.gmail.com>
References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com>
	<CAP7+vJLq=6-RYCatdCQsY-VHoszXYEe650+93YP1kWc+xhnkGw@mail.gmail.com>
	<1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com>
	<CAP7+vJJUrhiRhWgQXN7EepN6KSu0o4_CFvGwwZMzVxxkWBbU=A@mail.gmail.com>
Message-ID: <CANF4RMn17QzTKm_=zFHALyrm6qLJeSohCqLNp_DeWQw8YXJEYQ@mail.gmail.com>

I can confirm that libpthread.so (/usr/lib/x86_64-linux-gnu/libpthread.so)
is a linker script on my Ubuntu 11.04 install. This hasn't ever caused me
any problems, though.

As for why distributions do this, here are the contents of the script:

    /* GNU ld script
       Use the shared library, but some functions are only in
       the static library, so try that secondarily.  */
    OUTPUT_FORMAT(elf64-x86-64)
    GROUP ( /lib/x86_64-linux-gnu/libpthread.so.0
/usr/lib/x86_64-linux-gnu/libpthread_nonshared.a )

Cheers,
Nadeem

On Sun, Sep 11, 2011 at 12:24 AM, Guido van Rossum <guido at python.org> wrote:
> Odd. Let's see what other core devs say.
>
> On Sat, Sep 10, 2011 at 2:50 PM, Howard B. Golden
> <howard_b_golden at yahoo.com> wrote:
>> I don't know why, but some Linux distributions place scripts into .so
>> files instead of the actual binaries. This takes advantage of a feature
>> of GNU ld that it will process the script (which points to the actual
>> binary) when it links the .so file.
>>
>> This feature works fine when you are linking a binary, but it doesn't
>> take into account that binaries can be loaded dynamically by
>> interpreters (e.g., Python or GHCi). If dlopen finds a linker script, it
>> doesn't know what to do with it. It simply diagnoses the file as either
>> an invalid ELF header or too short.
>>
>> On Gentoo Linux, some common libraries that are represented as linker
>> scripts include libm.so, libpthread.so and libpcre.so. I know this also
>> affects Ubuntu.
>>
>> Howard
>>
>> On Sat, 2011-09-10 at 14:39 -0700, Guido van Rossum wrote:
>>> Excuse me for asking a newbie question, but what are linker scripts
>>> and why are they important? I don't recall anyone ever having
>>> requested this feature before.
>>>
>>> --Guido
>>>
>>> On Wed, Sep 7, 2011 at 12:33 PM, Howard B. Golden
>>> <howard_b_golden at yahoo.com> wrote:
>>> > Hi,
>>> >
>>> > In Haskell I experienced a situation where dynamically loaded modules
>>> > were experiencing "invalid ELF header" errors. This was caused by the
>>> > module names actually referring to linker scripts rather than ELF
>>> > binaries. I patched the GHC runtime system to deal with these scripts.
>>> >
>>> > I noticed that this same patch has been ported to Ruby and Node.js, so I
>>> > suggested to the libc developers that they might wish to incorporate the
>>> > patch into their library, making it available to all languages. They
>>> > rejected this suggestion, so I am making the suggestion to the Python
>>> > devs in case it is of interest to you.
>>> >
>>> > Basically, when a linker script is loaded by dlopen, an "invalid ELF
>>> > header" error occurs. The patch checks to see if the file is a linker
>>> > script. If so, it finds the name of the real ELF binary with a regular
>>> > expression and tries to dlopen it. If successful, processing proceeds.
>>> > Otherwise, the original "invalid ELF error" message is returned.
>>> >
>>> > If you want to add this code to Python, you can look at my original
>>> > patch (http://hackage.haskell.org/trac/ghc/ticket/2615) or the Ruby
>>> > version (https://github.com/ffi/ffi/pull/117) or the Node.js version
>>> > (https://github.com/rbranson/node-ffi/pull/5) to help port it.
>>> >
>>> > Note that the GHC version in GHC 7.2.1 has been enhanced to also handle
>>> > another possible error when the linker script is too short, so you might
>>> > also want to add this enhancement also (see
>>> > https://github.com/ghc/blob/master/rts/Linker.c line 1191 for the
>>> > revised regular expression):
>>> >
>>> > "(([^ \t()])+\\.so([^ \t:()])*):([ \t])*(invalid ELF header|file too
>>> > short)"
>>> >
>>> > At this point, I don't have the free time to write the Python patch
>>> > myself, so I apologize in advance for not providing it to you.
>>> >
>>> > HTH,
>>> >
>>> > Howard B. Golden
>>> > Northridge, California, USA
>>> >
>>> > _______________________________________________
>>> > Python-Dev mailing list
>>> > Python-Dev at python.org
>>> > http://mail.python.org/mailman/listinfo/python-dev
>>> > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>> >
>>>
>>>
>>>
>>
>>
>>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/nadeem.vawda%40gmail.com
>

From howard_b_golden at yahoo.com  Sun Sep 11 01:35:19 2011
From: howard_b_golden at yahoo.com (Howard B. Golden)
Date: Sat, 10 Sep 2011 16:35:19 -0700
Subject: [Python-Dev] Handling linker scripts reached when dynamically
 loading a module
In-Reply-To: <CANF4RMn17QzTKm_=zFHALyrm6qLJeSohCqLNp_DeWQw8YXJEYQ@mail.gmail.com>
References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com>
	<CAP7+vJLq=6-RYCatdCQsY-VHoszXYEe650+93YP1kWc+xhnkGw@mail.gmail.com>
	<1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com>
	<CAP7+vJJUrhiRhWgQXN7EepN6KSu0o4_CFvGwwZMzVxxkWBbU=A@mail.gmail.com>
	<CANF4RMn17QzTKm_=zFHALyrm6qLJeSohCqLNp_DeWQw8YXJEYQ@mail.gmail.com>
Message-ID: <1315697719.16652.15.camel@www.hbg-srv3.hgolden.socal.rr.com>

On Sun, 2011-09-11 at 00:39 +0200, Nadeem Vawda wrote:
> I can confirm that libpthread.so (/usr/lib/x86_64-linux-gnu/libpthread.so)
> is a linker script on my Ubuntu 11.04 install. This hasn't ever caused me
> any problems, though.
> 
> As for why distributions do this, here are the contents of the script:
> 
>     /* GNU ld script
>        Use the shared library, but some functions are only in
>        the static library, so try that secondarily.  */
>     OUTPUT_FORMAT(elf64-x86-64)
>     GROUP ( /lib/x86_64-linux-gnu/libpthread.so.0
> /usr/lib/x86_64-linux-gnu/libpthread_nonshared.a )
> 
> Cheers,
> Nadeem

Let me clarify: This will only be a problem when using a foreign
function interface to call a non-versioned module dynamically. In the
more common situation, when one links to a package specified at link
time, the linker figures out the specific, versioned name of the .so
file and then the dlopen will refer to the actual binary.

So, in Python, this is likely to only affect users calling packages
using ctypes. (This corresponds to GHCi loading an unversioned library,
e.g., "ghci -lm" which would load the current version of the math
library into the GHC interpreter.)

Howard


From wolfson at gmail.com  Sun Sep 11 01:52:07 2011
From: wolfson at gmail.com (Ben Wolfson)
Date: Sat, 10 Sep 2011 16:52:07 -0700
Subject: [Python-Dev] Handling linker scripts reached when dynamically
 loading a module
In-Reply-To: <1315697719.16652.15.camel@www.hbg-srv3.hgolden.socal.rr.com>
References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com>
	<CAP7+vJLq=6-RYCatdCQsY-VHoszXYEe650+93YP1kWc+xhnkGw@mail.gmail.com>
	<1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com>
	<CAP7+vJJUrhiRhWgQXN7EepN6KSu0o4_CFvGwwZMzVxxkWBbU=A@mail.gmail.com>
	<CANF4RMn17QzTKm_=zFHALyrm6qLJeSohCqLNp_DeWQw8YXJEYQ@mail.gmail.com>
	<1315697719.16652.15.camel@www.hbg-srv3.hgolden.socal.rr.com>
Message-ID: <CAPc-aXm9hQy9vGueCJGW8kHmUMbATQSp9ZGLp3yg8_ZfotQapg@mail.gmail.com>

On Sat, Sep 10, 2011 at 4:35 PM, Howard B. Golden
<howard_b_golden at yahoo.com> wrote:
>
> So, in Python, this is likely to only affect users calling packages
> using ctypes. (This corresponds to GHCi loading an unversioned library,
> e.g., "ghci -lm" which would load the current version of the math
> library into the GHC interpreter.)

And it does do so on Gentoo:

$ python
Python 2.6.6 (r266:84292, Dec 26 2010, 17:43:52)
[GCC 4.4.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes import cdll
>>> cdll.LoadLibrary('libpthread.so')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/ctypes/__init__.py", line 431, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python2.6/ctypes/__init__.py", line 353, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/libpthread.so: invalid ELF header
>>> cdll.LoadLibrary('libpthread.so.0')
<CDLL 'libpthread.so.0', handle b76f3798 at 9634e0c>
>>>

$ cat /usr/lib/libpthread.so
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf32-i386)
GROUP ( /lib/libpthread.so.0 /usr/lib/libpthread_nonshared.a )


-- 
Ben Wolfson
"Human kind has used its intelligence to vary the flavour of drinks,
which may be sweet, aromatic, fermented or spirit-based. ... Family
and social life also offer numerous other occasions to consume drinks
for pleasure." [Larousse, "Drink" entry]

From martin at v.loewis.de  Sun Sep 11 09:08:29 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 11 Sep 2011 09:08:29 +0200
Subject: [Python-Dev] Handling linker scripts reached when dynamically
 loading a module
In-Reply-To: <1315697719.16652.15.camel@www.hbg-srv3.hgolden.socal.rr.com>
References: <1315423991.7843.19.camel@www.hbg-srv3.hgolden.socal.rr.com>	<CAP7+vJLq=6-RYCatdCQsY-VHoszXYEe650+93YP1kWc+xhnkGw@mail.gmail.com>	<1315691442.16652.8.camel@www.hbg-srv3.hgolden.socal.rr.com>	<CAP7+vJJUrhiRhWgQXN7EepN6KSu0o4_CFvGwwZMzVxxkWBbU=A@mail.gmail.com>	<CANF4RMn17QzTKm_=zFHALyrm6qLJeSohCqLNp_DeWQw8YXJEYQ@mail.gmail.com>
	<1315697719.16652.15.camel@www.hbg-srv3.hgolden.socal.rr.com>
Message-ID: <4E6C5E6D.5000502@v.loewis.de>

> Let me clarify: This will only be a problem when using a foreign
> function interface to call a non-versioned module dynamically.

As such, it won't be much of a problem for Python. In Python, we
don't normally dlopen .so files, except when we know they are
Python extension modules, in which case we also know that they won't
be linker scripts - it just doesn't make sense to write a linker script
for what should be a Python module, since you won't ever link against
Python modules.

The only case where it might matter is ctypes, which is Python's
"dynamic" FFI (as opposed to the C API, which is the "static" FFI).
However, those libraries which are often wrapped with linker scripts
don't typically get used in ctypes - e.g. libpthread won't be used in
ctypes, but along with the thread module. The only common case where
a library that is often a linker script gets also often used in ctypes
(i.e. libc) is already special-cased - ctypes knows how to find the
"real" C library.

IOW, I would defer this until it becomes a real problem, at what
point whoever has that problem ought to provide a patch.

Regards,
Martin

From fuzzyman at voidspace.org.uk  Sun Sep 11 20:49:06 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Sun, 11 Sep 2011 19:49:06 +0100
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <87pqjbhht9.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
	<CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
	<20110907030758.58caa4ed@pitrou.net>
	<8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110907144749.7c1a9d50@pitrou.net>
	<87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com>
	<87pqjbhht9.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4E6D02A2.7040907@voidspace.org.uk>

On 08/09/2011 03:46, Stephen J. Turnbull wrote:
> Glyph Lefkowitz writes:
>   >  On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:
>   >
>   >  >  How about "title"?
>   >
>   >  >>>  'content-length'.title()
>   >  'Content-Length'
>   >  

Does anyone *actually* use .title() for this? (And why not just use the 
correct casing in the string literal...)

Michael

>   >  You might say that the protocol "has" to be case-insensitive so
>   >  this is a silly frill:
>
> Not me, sir.  My whole point about the "bytes should be more like str"
> controversy is the dual of that: you don't know what will be coming at
> you, so the regularities and (normally allowable) fuzziness of text
> processing are inadmissible.
>
>   >  there are definitely enough case-sensitive crappy bits of network
>   >  middleware out there that this function is critically important for
>   >  an HTTP server.
>
> "Critically important" is surely an overstatement.  You could always
> title-case the literal strings containing field names in the source.
>
> The problem with having lots of str-like features on bytes is that you
> lose TOOWDTI, or worse, to many performance-happy coders, use of bytes
> becomes TOOWDTI "because none of the characters[sic] I'm planning to
> process myself are non-ASCII".  This is the road to Babel; it's
> workable for one-off scripts but it's asking for long-term trouble in
> multi-module applications.  The choice of decoding to str and
> processing in that form should be made as attractive as possible.
>
> On the other hand, it is undeniably useful for protocol tokens to have
> mnemonic representations even in binary protocols.  Textual
> manipulations on those tokens should be convenient.
>
> It seems to me that what might be an improvement over the current
> situation (maybe for Py4k only, though) is for bytes and
> (PEP-393-style) str to share representation, and have a "cast" method
> which would convert from one to the other, validating that the range
> contraints on the representation are satisfied.  The problem I see is
> that this either sanctions the practice of using latin-1 as "ASCII
> plus anything", which is an unpleasant hack, or you'd need to check in
> text methods that nothing is done with non-ASCII values other than
> checks for set membership (including equality comparison, of course).
>
> OTOH, AFAICS, Antoine's claim that inserting a non-latin-1 character
> in a str that happens to contain only ASCII values would convert the
> representation to multioctets (true), and therefore this doesn't give
> the desired efficiency properties, is beside the point.  Just don't do
> that!  You *can't* do that in a bytes object, anyway; use of str in
> this way is a "consenting adults" issue.  You trade off the
> convenience of the full suite of text tools vs. the possibility that
> somebody might insert such a character -- but for the algorithms
> they're going to be using, they shouldn't be doing that anyway.
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>


-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html


From nadeem.vawda at gmail.com  Sun Sep 11 23:30:44 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Sun, 11 Sep 2011 23:30:44 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CANF4RMmysZest=cYG0rbPBL1YB24hh8ttV5gnxAP-QMknzoFwA@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<CANF4RMmKLXbmE04NBi+s9n+-GtXhWS-1bv3TCDL8XzBL4NeVXQ@mail.gmail.com>
	<20110829083029.68faa57b@resist.wooz.org>
	<CANF4RMmysZest=cYG0rbPBL1YB24hh8ttV5gnxAP-QMknzoFwA@mail.gmail.com>
Message-ID: <CANF4RMmztAatxgjGAqS6psnzngZC8DM_twD-O6TuapY79kswBw@mail.gmail.com>

I've posted an updated patch to the bug tracker, with a complete implementation
of the lzma module, including 100% test coverage for the LZMAFile class (which
is implemented entirely in Python). It doesn't include ReST documentation (yet),
but the docstrings are quite detailed.

Please take a look and let me know what you think.

Cheers,
Nadeem

From glyph at twistedmatrix.com  Mon Sep 12 02:22:15 2011
From: glyph at twistedmatrix.com (Glyph Lefkowitz)
Date: Sun, 11 Sep 2011 17:22:15 -0700
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <4E6D02A2.7040907@voidspace.org.uk>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk>
	<CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com>
	<87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<j45jer$o06$1@dough.gmane.org> <j45qv0$il8$1@dough.gmane.org>
	<4E66763B.7080707@pearwood.info>
	<C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk>
	<4E668029.6080106@v.loewis.de>
	<7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk>
	<4E66845F.3060708@v.loewis.de>
	<4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com>
	<CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com>
	<20110907030758.58caa4ed@pitrou.net>
	<8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20110907144749.7c1a9d50@pitrou.net>
	<87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com>
	<87pqjbhht9.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E6D02A2.7040907@voidspace.org.uk>
Message-ID: <87FDD2A2-4D24-408C-AF0A-9A359D9B775E@twistedmatrix.com>


On Sep 11, 2011, at 11:49 AM, Michael Foord wrote:

> Does anyone *actually* use .title() for this? (And why not just use the correct casing in the string literal...)

Yes.  Twisted does, in various MIME-ish places (IMAP, SIP), although not in HTTP from what I can see.  I imagine other similar software would as well.

One issue is that you don't always have a string literal to work with.  If you're proxying traffic, you start from a mis-cased header and you possibly need to correct it to a canonically-cased one.  (On at least one occasion I've had to use such a proxy to make certain buggy client software work.)

Of course you could have something like {b"CONNECTION-LOST": b"Connection-Lost", ...} somewhere at module scope, but that feels a bit sillier than just having a nice '.title()' method.

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110911/43e094bf/attachment.html>

From fumanchu at aminus.org  Mon Sep 12 17:59:58 2011
From: fumanchu at aminus.org (Robert Brewer)
Date: Mon, 12 Sep 2011 08:59:58 -0700
Subject: [Python-Dev] Maintenance burden of str.swapcase
In-Reply-To: <87FDD2A2-4D24-408C-AF0A-9A359D9B775E@twistedmatrix.com>
References: <785989C3-66EA-4ED9-B6D2-E55FE2A30DE8@voidspace.org.uk><CAH0mxTSg923F+tmY=_PkAJB5QE4dNKAtspFwsCH9S5r6ZVBiUA@mail.gmail.com><87ehztip2r.fsf@uwakimon.sk.tsukuba.ac.jp><j45jer$o06$1@dough.gmane.org>
	<j45qv0$il8$1@dough.gmane.org><4E66763B.7080707@pearwood.info><C9ABBBA1-AF1C-44C7-8BA9-63AD6F37ADD0@voidspace.org.uk><4E668029.6080106@v.loewis.de><7EA8F302-1DC4-43D9-B124-09A51172E9BF@voidspace.org.uk><4E66845F.3060708@v.loewis.de><4271E434-0E35-4D7E-98D0-097402B8C3FD@gmail.com><CADiSq7fsp_B_XDsd83WtptKMEjFZWu+DCd0hgea2RXZCVdbKKw@mail.gmail.com><20110907030758.58caa4ed@pitrou.net><8762l5hzdj.fsf@uwakimon.sk.tsukuba.ac.jp><20110907144749.7c1a9d50@pitrou.net><87zkiggt7b.fsf@uwakimon.sk.tsukuba.ac.jp><4D7C92E4-1F39-4D39-9E99-8E323A6E1282@twistedmatrix.com><87pqjbhht9.fsf@uwakimon.sk.tsukuba.ac.jp><4E6D02A2.7040907@voidspace.org.uk>
	<87FDD2A2-4D24-408C-AF0A-9A359D9B775E@twistedmatrix.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E64134B6ED0@ex10.hostedexchange.local>

Glyph Lefkowitz wrote:
> On Sep 11, 2011, at 11:49 AM, Michael Foord wrote:
> Does anyone *actually* use .title() for this?
> 
> Yes. ?Twisted does, in various MIME-ish places (IMAP, SIP),
> although not in HTTP from what I can see. ?I imagine other
> similar software would as well.

Not to mention it doesn't work for WWW-Authenticate or TE, to give just a couple of examples.


Robert Brewer
fumanchu at aminus.org

From merwok at netwok.org  Tue Sep 13 17:57:31 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Tue, 13 Sep 2011 17:57:31 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <CAGSi+Q42HpxfsVMcyLRQAQsVEBuGXcim-jkg17Dt3f15Rij3hQ@mail.gmail.com>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>	<loom.20110818T002304-368@post.gmane.org>	<4E4D5992.7070603@netwok.org>
	<CAGSi+Q42HpxfsVMcyLRQAQsVEBuGXcim-jkg17Dt3f15Rij3hQ@mail.gmail.com>
Message-ID: <4E6F7D6B.9040709@netwok.org>

Hi,

Here?s a status update on distutils2.

Vinay did the bulk of the work in his initial commit; we just had to
re-add some mistakenly deleted helpers in d2.tests and d2.tests.support,
change sysconfig imports and remove duplicate files (sysconfig.*).

A contributor did a huge commit to restore 2.4 compatibility.  I pulled
it, because it was a useful contribution, and am now in the middle of
cleaning it: some conversions were not idiomatic or even buggy, just
like when we converted from 2.x to 3.x.

Alexis and I have been working in parallel, with some unfortunate
duplication.  We?ve resolved to use the tracker or email to coordinate.
 When I am finished cleaning up the 2.4 compat changes, I?ll backport
all outstanding changesets that were done in packaging, and then I?ll
try to fix the few (on linux3^Wlinux) test failures.

When the d2 codebase matches packaging's again, it will be easy to keep
both codebases in sync.  I will edit the wiki page about contributing to
state that I will accept patches made against d2 instead of packaging,
to lower the contribution bar.  It would be very useful to have buildbots.

A question: What about distutils2 for Python 3.x?  I think we could keep
the stdlib codebase compatible with 3.1 and use a semi-automated process
to extract cpython3.3/Lib/packaging to distutils2-py3/distutils2 and
rename imports.  (IIRC PyPI will require us to play games to have both
2.x and 3.x versions of distutils2.)

Another question: What about the docs?  Can we just point people to
docs.python.org and tell them to mentally replace packaging with
distutils2?  If that is judged unacceptable, then I?ll synchronize the
docs in the d2 repo, but that?s hours I won?t spend on bugs or features.

Cheers

From fuzzyman at voidspace.org.uk  Tue Sep 13 18:34:39 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Tue, 13 Sep 2011 17:34:39 +0100
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <4E6F7D6B.9040709@netwok.org>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>	<loom.20110818T002304-368@post.gmane.org>	<4E4D5992.7070603@netwok.org>
	<CAGSi+Q42HpxfsVMcyLRQAQsVEBuGXcim-jkg17Dt3f15Rij3hQ@mail.gmail.com>
	<4E6F7D6B.9040709@netwok.org>
Message-ID: <4E6F861F.5020904@voidspace.org.uk>

On 13/09/2011 16:57, ?ric Araujo wrote:
> [snip...]
> A question: What about distutils2 for Python 3.x?  I think we could keep
> the stdlib codebase compatible with 3.1 and use a semi-automated process
> to extract cpython3.3/Lib/packaging to distutils2-py3/distutils2 and
> rename imports.  (IIRC PyPI will require us to play games to have both
> 2.x and 3.x versions of distutils2.)

What I'm doing for unittest2.

1) I have a script that pulls unittest from mercurial head and then 
applies patches to it to make it compatible with Python 3.1 - 3.2 and 
rename it from unittest to unittest2
2) I have a pypi project called unittestpy3k that holds the Python 3 
version of unittest2

Projects using unittest2 for Python 3 then have a dependency on 
unittest2py3k - but the actual Python package name is unittest2. I only 
need to maintain a set of patches against unittest on head, rather than 
a whole branch.

This works pretty well.

All the best,

Michael Foord

> Another question: What about the docs?  Can we just point people to
> docs.python.org and tell them to mentally replace packaging with
> distutils2?  If that is judged unacceptable, then I?ll synchronize the
> docs in the d2 repo, but that?s hours I won?t spend on bugs or features.
>
> Cheers
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>


-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html


From drkjam at gmail.com  Wed Sep 14 01:09:11 2011
From: drkjam at gmail.com (DrKJam)
Date: Wed, 14 Sep 2011 00:09:11 +0100
Subject: [Python-Dev] PyPI trove classifiers for alternate language
	implementations
Message-ID: <CAFevmDOMsM-CrZiMC_6KiABaej1Or4rW2QWgXC1GrYUNhN7c=g@mail.gmail.com>

Would it be possible to have trove classifiers added to PyPI specifically
for PyPy and possibly also Jython and IronPython?

Regards,

David Moss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110914/e149dc13/attachment.html>

From ncoghlan at gmail.com  Wed Sep 14 01:18:12 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 14 Sep 2011 09:18:12 +1000
Subject: [Python-Dev] PyPI trove classifiers for alternate language
	implementations
In-Reply-To: <CAFevmDOMsM-CrZiMC_6KiABaej1Or4rW2QWgXC1GrYUNhN7c=g@mail.gmail.com>
References: <CAFevmDOMsM-CrZiMC_6KiABaej1Or4rW2QWgXC1GrYUNhN7c=g@mail.gmail.com>
Message-ID: <CADiSq7dNhexnM56-u3W1Y_i3u1vAdHrFBtFVLbrri0JXLVf1XQ@mail.gmail.com>

On Wed, Sep 14, 2011 at 9:09 AM, DrKJam <drkjam at gmail.com> wrote:
> Would it be possible to have trove classifiers added to PyPI specifically
> for PyPy and possibly also Jython and IronPython?

Possibly, but the place to ask would be catalog-sig at python.org

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From ausfdes at gmail.com  Wed Sep 14 13:13:15 2011
From: ausfdes at gmail.com (Austin Fernandes)
Date: Wed, 14 Sep 2011 16:43:15 +0530
Subject: [Python-Dev] Windows 8 support
Message-ID: <CABr2BmRPxN=c=1NOxRUkL4QRTYgZm4vrw9955M-zEjQcm18BWg@mail.gmail.com>

Hi,

Which versions of python will be compatible with windows8. I am using
currently 2.7.2 version.

Thanks,
Austin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110914/1c134f89/attachment.html>

From nyamatongwe at gmail.com  Wed Sep 14 13:38:36 2011
From: nyamatongwe at gmail.com (Neil Hodgson)
Date: Wed, 14 Sep 2011 21:38:36 +1000
Subject: [Python-Dev] Windows 8 support
In-Reply-To: <CABr2BmRPxN=c=1NOxRUkL4QRTYgZm4vrw9955M-zEjQcm18BWg@mail.gmail.com>
References: <CABr2BmRPxN=c=1NOxRUkL4QRTYgZm4vrw9955M-zEjQcm18BWg@mail.gmail.com>
Message-ID: <CAMLCkUf2uZuAm-bWcxXcG5Penx-xg42ZkTSsX9kO7tUZQn79fg@mail.gmail.com>

Austin Fernandes:

> Which versions of python will be compatible with windows8. I am using
> currently 2.7.2 version.

   Current releases of both Python 2.7 and Python 3.2 appear to run
fine on the Windows 8 Developer Preview. You should download and
install the preview to ensure that your own code is compatible.

   Neil

From jdhardy at gmail.com  Wed Sep 14 18:23:27 2011
From: jdhardy at gmail.com (Jeff Hardy)
Date: Wed, 14 Sep 2011 09:23:27 -0700
Subject: [Python-Dev] Windows 8 support
In-Reply-To: <CAMLCkUf2uZuAm-bWcxXcG5Penx-xg42ZkTSsX9kO7tUZQn79fg@mail.gmail.com>
References: <CABr2BmRPxN=c=1NOxRUkL4QRTYgZm4vrw9955M-zEjQcm18BWg@mail.gmail.com>
	<CAMLCkUf2uZuAm-bWcxXcG5Penx-xg42ZkTSsX9kO7tUZQn79fg@mail.gmail.com>
Message-ID: <CAF7AXFGFUJ=6-okbxW1T8fFXhrDLcqY56UD4U5ax5nKqa3=VjQ@mail.gmail.com>

On Wed, Sep 14, 2011 at 4:38 AM, Neil Hodgson <nyamatongwe at gmail.com> wrote:
> Austin Fernandes:
>
>> Which versions of python will be compatible with windows8. I am using
>> currently 2.7.2 version.
>
> ? Current releases of both Python 2.7 and Python 3.2 appear to run
> fine on the Windows 8 Developer Preview. You should download and
> install the preview to ensure that your own code is compatible.

Another question is whether Python can take advantage of WinRT (the
new UI framework). It should be possible, as the new APIs were
designed to be used? from dynamic languages, but I haven't decided if
I'm crazy enough to try it.

- Jeff

From martin at v.loewis.de  Wed Sep 14 22:41:49 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 14 Sep 2011 22:41:49 +0200
Subject: [Python-Dev] Windows 8 support
In-Reply-To: <CAF7AXFGFUJ=6-okbxW1T8fFXhrDLcqY56UD4U5ax5nKqa3=VjQ@mail.gmail.com>
References: <CABr2BmRPxN=c=1NOxRUkL4QRTYgZm4vrw9955M-zEjQcm18BWg@mail.gmail.com>	<CAMLCkUf2uZuAm-bWcxXcG5Penx-xg42ZkTSsX9kO7tUZQn79fg@mail.gmail.com>
	<CAF7AXFGFUJ=6-okbxW1T8fFXhrDLcqY56UD4U5ax5nKqa3=VjQ@mail.gmail.com>
Message-ID: <4E71118D.9090305@v.loewis.de>


> Another question is whether Python can take advantage of WinRT (the
> new UI framework). It should be possible, as the new APIs were
> designed to be used from dynamic languages, but I haven't decided if
> I'm crazy enough to try it.

Python doesn't do GUI on its own, so the direct answer to this question
is "no, it can't take advantage of WinRT".

Of course, people might start writing Python wrappers for WinRT, 
possibly leading to a PyRT package. Alternatively, wxWindows might
start using WinRT, which would automatically expose it to wxPython
applications. Likewise, Tk might integrate support for WinRT, in
which case IDLE might make use of it out of the box.

Regards,
Martin

From martin at v.loewis.de  Wed Sep 14 22:53:19 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 14 Sep 2011 22:53:19 +0200
Subject: [Python-Dev] Windows 8 support
In-Reply-To: <CABr2BmRPxN=c=1NOxRUkL4QRTYgZm4vrw9955M-zEjQcm18BWg@mail.gmail.com>
References: <CABr2BmRPxN=c=1NOxRUkL4QRTYgZm4vrw9955M-zEjQcm18BWg@mail.gmail.com>
Message-ID: <4E71143F.4000107@v.loewis.de>

 > Which versions of python will be compatible with windows8. I am using
 > currently 2.7.2 version.

Most likely, all versions back to Python 1.1 or so will be compatible
with Windows 8 (when 32-bit Windows support was first added to Python).

Python uses very little of the Windows API (compared to, say, a game).
Microsoft isn't going to break any of this for the next decade. Support
for 16-bit applications is being dropped, but Python didn't really
support 16-bit Windows all that well (although there was a DOS port).

Regards,
Martin

From greg.ewing at canterbury.ac.nz  Thu Sep 15 00:52:51 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 Sep 2011 10:52:51 +1200
Subject: [Python-Dev] Windows 8 support
In-Reply-To: <CAF7AXFGFUJ=6-okbxW1T8fFXhrDLcqY56UD4U5ax5nKqa3=VjQ@mail.gmail.com>
References: <CABr2BmRPxN=c=1NOxRUkL4QRTYgZm4vrw9955M-zEjQcm18BWg@mail.gmail.com>
	<CAMLCkUf2uZuAm-bWcxXcG5Penx-xg42ZkTSsX9kO7tUZQn79fg@mail.gmail.com>
	<CAF7AXFGFUJ=6-okbxW1T8fFXhrDLcqY56UD4U5ax5nKqa3=VjQ@mail.gmail.com>
Message-ID: <4E713043.100@canterbury.ac.nz>

Jeff Hardy wrote:

> Another question is whether Python can take advantage of WinRT (the
> new UI framework). It should be possible, as the new APIs were
> designed to be used? from dynamic languages, but I haven't decided if
> I'm crazy enough to try it.

WinRT certainly sounds like the way to go in the future.
I'm glad to hear that .NET isn't going to take over the
world after all!

-- 
Greg

From eliben at gmail.com  Thu Sep 15 08:53:19 2011
From: eliben at gmail.com (Eli Bendersky)
Date: Thu, 15 Sep 2011 09:53:19 +0300
Subject: [Python-Dev] Windows 8 support
In-Reply-To: <4E713043.100@canterbury.ac.nz>
References: <CABr2BmRPxN=c=1NOxRUkL4QRTYgZm4vrw9955M-zEjQcm18BWg@mail.gmail.com>
	<CAMLCkUf2uZuAm-bWcxXcG5Penx-xg42ZkTSsX9kO7tUZQn79fg@mail.gmail.com>
	<CAF7AXFGFUJ=6-okbxW1T8fFXhrDLcqY56UD4U5ax5nKqa3=VjQ@mail.gmail.com>
	<4E713043.100@canterbury.ac.nz>
Message-ID: <CAF-Rda9u+PHewXws8Zt4V2ZdaPavQoa=RtPJgTs=g5mb6Hr2Cw@mail.gmail.com>

> Another question is whether Python can take advantage of WinRT (the
>> new UI framework). It should be possible, as the new APIs were
>> designed to be used? from dynamic languages, but I haven't decided if
>> I'm crazy enough to try it.
>>
>
> WinRT certainly sounds like the way to go in the future.
> I'm glad to hear that .NET isn't going to take over the
> world after all!
>

I'm not sure whether I prefer Javascript doing that, though :)
Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110915/325aa87c/attachment.html>

From jai.unix at gmail.com  Thu Sep 15 08:56:54 2011
From: jai.unix at gmail.com (Jai Sharma)
Date: Thu, 15 Sep 2011 12:26:54 +0530
Subject: [Python-Dev] Not able to do unregister a code
Message-ID: <CADSfAAeCXKjZtSfawNsuun_RzAiBZp0xHkOGOayFJ-C2oXBozg@mail.gmail.com>

Hi,

I am facing a memory leaking issue with codecs. I make my own ABC class and
register it with codes.

import codecs
codecs.register(ABC)

but I am not able to remove ABC from memory. Is there any alternative to do
that.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110915/68693606/attachment.html>

From jai.unix at gmail.com  Thu Sep 15 09:03:47 2011
From: jai.unix at gmail.com (Jai Sharma)
Date: Thu, 15 Sep 2011 12:33:47 +0530
Subject: [Python-Dev] Not able to do unregister a code
In-Reply-To: <CADSfAAeCXKjZtSfawNsuun_RzAiBZp0xHkOGOayFJ-C2oXBozg@mail.gmail.com>
References: <CADSfAAeCXKjZtSfawNsuun_RzAiBZp0xHkOGOayFJ-C2oXBozg@mail.gmail.com>
Message-ID: <CADSfAAcyYosxL5G1phObC10oMGfkxEZ87Aez1w6tJ2JeFNvX+g@mail.gmail.com>

Below is reference pattern:

 0: _ --- [-] 4 <size = 6280>: 0xa70ca44, 0xa70e79c, 0xe5c602c, 0xe6219bc
 1: a      [-] 4 tuple: 0xab11c5c*3, 0xe72a43c*3, 0xe73c16c*3, 0xe73c1bc*3
 2: aa ---- [-] 4 function: ABC.l_codecs.decode...
 3: a3       [S] 2 dict of class: ..Codec, ..Codec
 4: aab ---- [-] 4 types.MethodType: <ABC at 0x...
 5: aaba      [-] 2 tuple: 0xe72a2d4*4, 0xe734d24*4
 6: aabaa ---- [-] 2 dict (no owner): 0x9b029bc*2, 0xb789f24cL*6
 7: aaba3       [S] 1 dict of class: ..ABC
 8: aabaab ---- [-] 1 tuple: 0x9b10f8c*2
 9: aabaaba      [+] 1 function: ABC.__l_codecs


On Thu, Sep 15, 2011 at 12:26 PM, Jai Sharma <jai.unix at gmail.com> wrote:

> Hi,
>
> I am facing a memory leaking issue with codecs. I make my own ABC class and
> register it with codes.
>
> import codecs
> codecs.register(ABC)
>
> but I am not able to remove ABC from memory. Is there any alternative to do
> that.
>
> Thanks
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110915/d73d6fc7/attachment.html>

From mal at egenix.com  Thu Sep 15 09:34:20 2011
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 15 Sep 2011 09:34:20 +0200
Subject: [Python-Dev] Not able to do unregister a code
In-Reply-To: <CADSfAAeCXKjZtSfawNsuun_RzAiBZp0xHkOGOayFJ-C2oXBozg@mail.gmail.com>
References: <CADSfAAeCXKjZtSfawNsuun_RzAiBZp0xHkOGOayFJ-C2oXBozg@mail.gmail.com>
Message-ID: <4E71AA7C.9090403@egenix.com>

Jai Sharma wrote:
> Hi,
> 
> I am facing a memory leaking issue with codecs. I make my own ABC class and
> register it with codes.
> 
> import codecs
> codecs.register(ABC)
> 
> but I am not able to remove ABC from memory. Is there any alternative to do
> that.

The ABC codec search function gets added to the codec registry search
path list which currently cannot be accessed directly.

There is no API to unregister a codec search function, since deregistration
would break the codec cache used by the registry to speedup codec
lookup.

Why would you want to unregister a codec search function ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 15 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany                19 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From nadeem.vawda at gmail.com  Thu Sep 15 11:37:15 2011
From: nadeem.vawda at gmail.com (Nadeem Vawda)
Date: Thu, 15 Sep 2011 11:37:15 +0200
Subject: [Python-Dev] LZMA compression support in 3.3
In-Reply-To: <CANF4RMmztAatxgjGAqS6psnzngZC8DM_twD-O6TuapY79kswBw@mail.gmail.com>
References: <CANF4RMmhsG=5zBfQm4KScxa9k9rdcSW7shPDx-umiNGAYWJGLQ@mail.gmail.com>
	<4E59041A.7040100@v.loewis.de>
	<CANF4RMk=yRYQE8nx_3VLauiA2cKj64ZwCinGCR-rDAbiNSB+XA@mail.gmail.com>
	<4E5909FD.7060809@v.loewis.de>
	<CADiSq7dbsK8vntWHudLQ59QF4CwSWHKdTqaqFzLxjrmFkSa52A@mail.gmail.com>
	<20110827174057.6c4b619e@pitrou.net>
	<CADiSq7dZD_AZe4VCrqbNq+uN5gAtii0fEmzNReTc9fC=3Xeprg@mail.gmail.com>
	<CANF4RMmXG_9DSXzxihpWgV82HTiFtzPOETm4h7c1m=dsGEwyAg@mail.gmail.com>
	<CADiSq7e0yJXKDm6hJHSGs_-OB7EOfVDyNSivHt=wRXt_dDPvpg@mail.gmail.com>
	<CAGGBd_oeyt6AfqG_vDMJMtyajyhvgc9FOwJ6BuBOmdpFn5EpJw@mail.gmail.com>
	<CANF4RMmKLXbmE04NBi+s9n+-GtXhWS-1bv3TCDL8XzBL4NeVXQ@mail.gmail.com>
	<20110829083029.68faa57b@resist.wooz.org>
	<CANF4RMmysZest=cYG0rbPBL1YB24hh8ttV5gnxAP-QMknzoFwA@mail.gmail.com>
	<CANF4RMmztAatxgjGAqS6psnzngZC8DM_twD-O6TuapY79kswBw@mail.gmail.com>
Message-ID: <CANF4RMn+a-0ukvak5XKqt9710BN-W5pgBmceoiqwHgMmBAWFaw@mail.gmail.com>

Another update - I've added proper documentation. Now the code should be
pretty much complete - all that's missing is the necessary bits and pieces
to build it on Windows.

Cheers,
Nadeem

From jcea at jcea.es  Thu Sep 15 13:29:27 2011
From: jcea at jcea.es (Jesus Cea)
Date: Thu, 15 Sep 2011 13:29:27 +0200
Subject: [Python-Dev] Do we have interest in a clang buildbot?
Message-ID: <4E71E197.7000006@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi, pals.

I am seeing a few commits related to clang (a C compiler, alternative
to GCC), but we ?only? have a buildbot using clang as the compiler.

If there is interest, I would deploy 32 and 64 bits buildbots under my
current OpenIndiana buildbot.

What do you think?

- -- 
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTnHhl5lgi5GaxT1NAQKlJgQApAwlZODoeG3G+HODkoSh6G5myqEXkS/0
YZM6wo+/uWb6ul50Kb9mWhucGhY1tc8wAxCDNsRcm8Vv/6sDLZOV0G++DIK0JXIw
BA8TyF/5CI8c5K3wnrVkazTo/Io1kVYMGc1FekIoQFI3oRKdXs/A6h63XWwxDMNu
PsGwVD4bizs=
=lJ/r
-----END PGP SIGNATURE-----

From martin at v.loewis.de  Thu Sep 15 17:31:34 2011
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 15 Sep 2011 17:31:34 +0200
Subject: [Python-Dev] PEP 393: Porting Guidelines
Message-ID: <4E721A56.1000900@v.loewis.de>

I added a section on porting guidelines to the PEP, resulting
from my own porting experience. Please review.

http://www.python.org/dev/peps/pep-0393/#porting-guidelines

Regards,
Martin

From martin at v.loewis.de  Thu Sep 15 17:50:41 2011
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 15 Sep 2011 17:50:41 +0200
Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings
Message-ID: <4E721ED1.1000001@v.loewis.de>

In reviewing memory usage, I found potential for saving more memory for
ASCII-only strings. Both Victor and Guido commented that something like
this be done; Antoine had asked whether there was anything that could
be done. Here is the idea:

In an ASCII-only string, the UTF-8 representation is shared with the
canonical one-byte representation. This would allow to drop the
UTF-8 pointer and the UTF-8 length field; instead, a flag in the state
would indicate that these fields are not there.

Likewise, the wchar_t/Py_UNICODE length can be shared (even though the
data cannot), since the ASCII-only string won't contain any surrogate
pairs.

To comply with the C aliasing rules, the structures would look like this:

typedef struct {
     PyObject_HEAD
     Py_ssize_t length;
     union {
         void *any;
         Py_UCS1 *latin1;
         Py_UCS2 *ucs2;
         Py_UCS4 *ucs4;
     } data;
     Py_hash_t hash;
     int state;     /* may include SSTATE_SHORT_ASCII flag */
     wchar_t *wstr;
} PyASCIIObject;


typedef struct {
     PyASCIIObject _base;
     Py_ssize_t utf8_length;
     char *utf8;
     Py_ssize_t wstr_length;
} PyUnicodeObject;

Code that directly accesses the structures would become more
complex; code that use the accessor macros wouldn't notice.

As a result, ASCII-only strings would lose three pointers,
and shrink to their 3.2 structure size. Since they also save
in the individual characters, strings with more than
3 characters (16-bit Py_UNICODE) or more than one character
(32-bit Py_UNICODE) would see a total size reduction compared
to 3.2.

Objects created throught the legacy API (PyUnicode_FromUnicode)
that are only later found to be ASCII-only (in PyUnicode_Ready)
would still have the UTF-8 pointer shared with the data pointer,
but keep including separate fields for pointer & size.

What do you think?

Regards,
Martin

P.S. There are similar reductions that could be applied
to the wstr_length in general: on 32-bit wchar_t systems,
it could be always dropped, on a 16-bit wchar_t system,
it could be dropped for UCS-2 strings. However, I'm not
proposing these, as I think the increase in complexity
is not worth the savings.

From merwok at netwok.org  Thu Sep 15 18:23:11 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Thu, 15 Sep 2011 18:23:11 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <4E6F861F.5020904@voidspace.org.uk>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>	<loom.20110818T002304-368@post.gmane.org>	<4E4D5992.7070603@netwok.org>
	<CAGSi+Q42HpxfsVMcyLRQAQsVEBuGXcim-jkg17Dt3f15Rij3hQ@mail.gmail.com>
	<4E6F7D6B.9040709@netwok.org> <4E6F861F.5020904@voidspace.org.uk>
Message-ID: <4E72266F.106@netwok.org>

Le 13/09/2011 18:34, Michael Foord a ?crit :
> On 13/09/2011 16:57, ?ric Araujo wrote:
>> (IIRC PyPI will require us to play games to have both
>> 2.x and 3.x versions of distutils2.)
> 
> What I'm doing for unittest2.
> [...]
> 2) I have a pypi project called unittestpy3k that holds the Python 3 
> version of unittest2
> 
> Projects using unittest2 for Python 3 then have a dependency on 
> unittest2py3k - but the actual Python package name is unittest2.

That?s what I call playing games.  I think it would make more sense to
push 2.x-compatible and 3.x-compatible sdists to PyPI (with an
appropriate 'Programming Language :: Python :: 2' or '3' classifier) and
have the download tools be smart.

Regards

From fdrake at acm.org  Thu Sep 15 19:08:34 2011
From: fdrake at acm.org (Fred Drake)
Date: Thu, 15 Sep 2011 13:08:34 -0400
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <4E72266F.106@netwok.org>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org> <4E4D5992.7070603@netwok.org>
	<CAGSi+Q42HpxfsVMcyLRQAQsVEBuGXcim-jkg17Dt3f15Rij3hQ@mail.gmail.com>
	<4E6F7D6B.9040709@netwok.org> <4E6F861F.5020904@voidspace.org.uk>
	<4E72266F.106@netwok.org>
Message-ID: <CAFT4OTFG39NMuaJTHd3Pn7bqWDC_0p2vyEm4rVhfVuPnqiu7eg@mail.gmail.com>

On Thu, Sep 15, 2011 at 12:23 PM, ?ric Araujo <merwok at netwok.org> wrote:
>?I think it would make more sense to
> push 2.x-compatible and 3.x-compatible sdists to PyPI (with an
> appropriate 'Programming Language :: Python :: 2' or '3' classifier) and
> have the download tools be smart.

FWIW, I prefer this as well.  I'd certainly appreciate the option to do it
this way.


  -Fred

-- 
Fred L. Drake, Jr.? ? <fdrake at acm.org>
"A person who won't read has no advantage over one who can't read."
?? --Samuel Langhorne Clemens

From ubershmekel at gmail.com  Thu Sep 15 19:50:06 2011
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Thu, 15 Sep 2011 20:50:06 +0300
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <4E72266F.106@netwok.org>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>
	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>
	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>
	<loom.20110818T002304-368@post.gmane.org>
	<4E4D5992.7070603@netwok.org>
	<CAGSi+Q42HpxfsVMcyLRQAQsVEBuGXcim-jkg17Dt3f15Rij3hQ@mail.gmail.com>
	<4E6F7D6B.9040709@netwok.org> <4E6F861F.5020904@voidspace.org.uk>
	<4E72266F.106@netwok.org>
Message-ID: <CANSw7Kzr_o+2_EWgc9vgJWChzWa2X5heXm1HPpbkYkB9-7L_+w@mail.gmail.com>

+2 for promoting naming consistency and putting metadata where it's supposed
to be.

--Yuval
On Sep 15, 2011 9:23 AM, "?ric Araujo" <merwok at netwok.org> wrote:
> Le 13/09/2011 18:34, Michael Foord a ?crit :
>> On 13/09/2011 16:57, ?ric Araujo wrote:
>>> (IIRC PyPI will require us to play games to have both
>>> 2.x and 3.x versions of distutils2.)
>>
>> What I'm doing for unittest2.
>> [...]
>> 2) I have a pypi project called unittestpy3k that holds the Python 3
>> version of unittest2
>>
>> Projects using unittest2 for Python 3 then have a dependency on
>> unittest2py3k - but the actual Python package name is unittest2.
>
> That?s what I call playing games. I think it would make more sense to
> push 2.x-compatible and 3.x-compatible sdists to PyPI (with an
> appropriate 'Programming Language :: Python :: 2' or '3' classifier) and
> have the download tools be smart.
>
> Regards
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ubershmekel%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110915/0841bce4/attachment.html>

From stefan at bytereef.org  Thu Sep 15 20:27:22 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 15 Sep 2011 20:27:22 +0200
Subject: [Python-Dev] Do we have interest in a clang buildbot?
In-Reply-To: <4E71E197.7000006@jcea.es>
References: <4E71E197.7000006@jcea.es>
Message-ID: <20110915182722.GA12130@sleipnir.bytereef.org>

Jesus Cea <jcea at jcea.es> wrote:
> I am seeing a few commits related to clang (a C compiler, alternative
> to GCC), but we ?only? have a buildbot using clang as the compiler.
> 
> If there is interest, I would deploy 32 and 64 bits buildbots under my
> current OpenIndiana buildbot.

I think it makes sense. clang has different warnings and the versions
>= 2.9 apparently optimize extremely aggressively. Probably it would
be most useful to run these bots with -O2 (and not --with-pydebug).


Stefan Krah


From fuzzyman at voidspace.org.uk  Thu Sep 15 20:31:52 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Thu, 15 Sep 2011 19:31:52 +0100
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <4E72266F.106@netwok.org>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>	<loom.20110818T002304-368@post.gmane.org>	<4E4D5992.7070603@netwok.org>
	<CAGSi+Q42HpxfsVMcyLRQAQsVEBuGXcim-jkg17Dt3f15Rij3hQ@mail.gmail.com>
	<4E6F7D6B.9040709@netwok.org>
	<4E6F861F.5020904@voidspace.org.uk> <4E72266F.106@netwok.org>
Message-ID: <4E724498.2040603@voidspace.org.uk>

On 15/09/2011 17:23, ?ric Araujo wrote:
> Le 13/09/2011 18:34, Michael Foord a ?crit :
>> On 13/09/2011 16:57, ?ric Araujo wrote:
>>> (IIRC PyPI will require us to play games to have both
>>> 2.x and 3.x versions of distutils2.)
>> What I'm doing for unittest2.
>> [...]
>> 2) I have a pypi project called unittestpy3k that holds the Python 3
>> version of unittest2
>>
>> Projects using unittest2 for Python 3 then have a dependency on
>> unittest2py3k - but the actual Python package name is unittest2.
> That?s what I call playing games.  I think it would make more sense to
> push 2.x-compatible and 3.x-compatible sdists to PyPI (with an
> appropriate 'Programming Language :: Python :: 2' or '3' classifier) and
> have the download tools be smart.

Hah, sure. In the meantime my way works *now* and with the existing 
tools. :-)

(But only actually true for the way I make it available from pypi - the 
rest of the technique is not "playing games", right?)

Yes, I would prefer to have a single project name with different 
distributions for Python 2 and 3 (and I looked into it) - but with the 
current tools the only way to achieve that is to put both versions into 
a single distribution. This prevents you from versioning them separately 
and is a pain to do anyway if the different versions are in different repos.

The current tools are a real pain for versioning anyway. If your pypi 
page even *links* to a page that offers an alpha or beta (in development 
version) for download then both pip and easy_install will fetch that, in 
preference to the most recent version on pypi. So yes, I agree there is 
room for improvement in the current tools. Hopefully distutils2 will fix 
that. ;-)

All the best,

Michael Foord


>
> Regards
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>


-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html


From tjreedy at udel.edu  Thu Sep 15 20:46:11 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 15 Sep 2011 14:46:11 -0400
Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings
In-Reply-To: <4E721ED1.1000001@v.loewis.de>
References: <4E721ED1.1000001@v.loewis.de>
Message-ID: <j4th5o$f1o$1@dough.gmane.org>

On 9/15/2011 11:50 AM, "Martin v. L?wis" wrote:

> To comply with the C aliasing rules, the structures would look like this:
>
> typedef struct {
> PyObject_HEAD
> Py_ssize_t length;
> union {
> void *any;
> Py_UCS1 *latin1;
> Py_UCS2 *ucs2;
> Py_UCS4 *ucs4;
> } data;
> Py_hash_t hash;
> int state; /* may include SSTATE_SHORT_ASCII flag */
> wchar_t *wstr;
> } PyASCIIObject;
>
>
> typedef struct {
> PyASCIIObject _base;
> Py_ssize_t utf8_length;
> char *utf8;
> Py_ssize_t wstr_length;
> } PyUnicodeObject;
>
> Code that directly accesses the structures would become more
> complex; code that use the accessor macros wouldn't notice.
...
> What do you think?

That nearly all code outside CPython itself should treat the unicode 
types, especially, as opaque types and only access instances through 
functions and macros -- the 'public' interfaces. We need to be free to 
fiddle with internal implementation details as experience suggests changes.

> P.S. There are similar reductions that could be applied
> to the wstr_length in general: on 32-bit wchar_t systems,
> it could be always dropped, on a 16-bit wchar_t system,
> it could be dropped for UCS-2 strings. However, I'm not
> proposing these, as I think the increase in complexity
> is not worth the savings.

I would certainly do just the one change now and see how it goes. I 
think you should be free to do more like the above if you change your 
mind with experience.

-- 
Terry Jan Reedy


From guido at python.org  Thu Sep 15 21:48:01 2011
From: guido at python.org (Guido van Rossum)
Date: Thu, 15 Sep 2011 12:48:01 -0700
Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings
In-Reply-To: <4E721ED1.1000001@v.loewis.de>
References: <4E721ED1.1000001@v.loewis.de>
Message-ID: <CAP7+vJJV17Cv5aYnV4KjHt-Q-=H-FUL4T2XcHf5VF4w7aRZMSQ@mail.gmail.com>

On Thu, Sep 15, 2011 at 8:50 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> In reviewing memory usage, I found potential for saving more memory for
> ASCII-only strings. Both Victor and Guido commented that something like
> this be done; Antoine had asked whether there was anything that could
> be done. Here is the idea:
>
> In an ASCII-only string, the UTF-8 representation is shared with the
> canonical one-byte representation. This would allow to drop the
> UTF-8 pointer and the UTF-8 length field; instead, a flag in the state
> would indicate that these fields are not there.
>
> Likewise, the wchar_t/Py_UNICODE length can be shared (even though the
> data cannot), since the ASCII-only string won't contain any surrogate
> pairs.
>
> To comply with the C aliasing rules, the structures would look like this:
>
> typedef struct {
> ? ?PyObject_HEAD
> ? ?Py_ssize_t length;
> ? ?union {
> ? ? ? ?void *any;
> ? ? ? ?Py_UCS1 *latin1;
> ? ? ? ?Py_UCS2 *ucs2;
> ? ? ? ?Py_UCS4 *ucs4;
> ? ?} data;
> ? ?Py_hash_t hash;
> ? ?int state; ? ? /* may include SSTATE_SHORT_ASCII flag */
> ? ?wchar_t *wstr;
> } PyASCIIObject;
>
>
> typedef struct {
> ? ?PyASCIIObject _base;
> ? ?Py_ssize_t utf8_length;
> ? ?char *utf8;
> ? ?Py_ssize_t wstr_length;
> } PyUnicodeObject;
>
> Code that directly accesses the structures would become more
> complex; code that use the accessor macros wouldn't notice.
>
> As a result, ASCII-only strings would lose three pointers,
> and shrink to their 3.2 structure size. Since they also save
> in the individual characters, strings with more than
> 3 characters (16-bit Py_UNICODE) or more than one character
> (32-bit Py_UNICODE) would see a total size reduction compared
> to 3.2.
>
> Objects created throught the legacy API (PyUnicode_FromUnicode)
> that are only later found to be ASCII-only (in PyUnicode_Ready)
> would still have the UTF-8 pointer shared with the data pointer,
> but keep including separate fields for pointer & size.
>
> What do you think?
>
> Regards,
> Martin
>
> P.S. There are similar reductions that could be applied
> to the wstr_length in general: on 32-bit wchar_t systems,
> it could be always dropped, on a 16-bit wchar_t system,
> it could be dropped for UCS-2 strings. However, I'm not
> proposing these, as I think the increase in complexity
> is not worth the savings.

This sounds like a good plan.

-- 
--Guido van Rossum (python.org/~guido)

From victor.stinner at haypocalc.com  Thu Sep 15 23:04:16 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 15 Sep 2011 23:04:16 +0200
Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings
In-Reply-To: <4E721ED1.1000001@v.loewis.de>
References: <4E721ED1.1000001@v.loewis.de>
Message-ID: <201109152304.16957.victor.stinner@haypocalc.com>

Le jeudi 15 septembre 2011 17:50:41, Martin v. L?wis a ?crit :
> In reviewing memory usage, I found potential for saving more memory for
> ASCII-only strings. (...)
> 
> typedef struct {
>      PyObject_HEAD
>      Py_ssize_t length;
>      union {
>          void *any;
>          Py_UCS1 *latin1;
>          Py_UCS2 *ucs2;
>          Py_UCS4 *ucs4;
>      } data;
>      Py_hash_t hash;
>      int state;     /* may include SSTATE_SHORT_ASCII flag */
>      wchar_t *wstr;
> } PyASCIIObject;

I like it. If we start which such optimization, we can also also remove data 
from strings allocated by the new API (it can be computed: object pointer + 
size of the structure). See my email for my proposition of structures:
   Re: [Python-Dev] PEP 393 review
   Thu Aug 25 00:29:19 2011

You may reorganize fields to be able to cast PyUnicodeObject to PyASCIIObject.

Victor


From martin at v.loewis.de  Thu Sep 15 23:39:13 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 15 Sep 2011 23:39:13 +0200
Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings
In-Reply-To: <201109152304.16957.victor.stinner@haypocalc.com>
References: <4E721ED1.1000001@v.loewis.de>
	<201109152304.16957.victor.stinner@haypocalc.com>
Message-ID: <4E727081.9010307@v.loewis.de>

> I like it. If we start which such optimization, we can also also remove data
> from strings allocated by the new API (it can be computed: object pointer +
> size of the structure). See my email for my proposition of structures:
>     Re: [Python-Dev] PEP 393 review
>     Thu Aug 25 00:29:19 2011

I agree it is tempting to drop the data pointer. However, I'm not sure
how many different structures we would end up with, and how the aliasing
rules would defeat this (you cannot interpret a struct X* as a struct 
Y*, unless either X is the first field of Y or vice versa).

Thinking about this, the following may work:
- ASCIIObject: state, length, hash, wstr*, data follow
- SingleBlockUnicode: ASCIIObject, wstr_len,
                       utf8*, utf8_len, data follow
- UnicodeObject: SingleBlockUnicode, data pointer, no data follow

This is essentially your proposal, except that the wstr_len is dropped 
for ASCII strings, and that it uses nested structs.

The single-block variants would always be "ready", the full unicode 
object is ready only if the data pointer is set.

I'll try it out, unless somebody can punch a hole into this proposal :-)

Regards,
Martin


From ncoghlan at gmail.com  Fri Sep 16 00:42:25 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 16 Sep 2011 08:42:25 +1000
Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings
In-Reply-To: <4E727081.9010307@v.loewis.de>
References: <4E721ED1.1000001@v.loewis.de>
	<201109152304.16957.victor.stinner@haypocalc.com>
	<4E727081.9010307@v.loewis.de>
Message-ID: <CADiSq7dX6JisM-Se5mZpfdSKANKfahqKLYeMuQvH09TKd5biUA@mail.gmail.com>

On Fri, Sep 16, 2011 at 7:39 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Thinking about this, the following may work:
> - ASCIIObject: state, length, hash, wstr*, data follow
> - SingleBlockUnicode: ASCIIObject, wstr_len,
> ? ? ? ? ? ? ? ? ? ? ?utf8*, utf8_len, data follow
> - UnicodeObject: SingleBlockUnicode, data pointer, no data follow
>
> This is essentially your proposal, except that the wstr_len is dropped for
> ASCII strings, and that it uses nested structs.
>
> The single-block variants would always be "ready", the full unicode object
> is ready only if the data pointer is set.

In your "UnicodeObject" here, is the 'data pointer' the
any/latin1/ucs2/ucs4 union from the original structure definition?

Also, what are the constraints on the "SingleBlockUnicode"? Does it
only hold strings that can be represented in latin1? Or can the size
of the individual elements be more than 1 byte?

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From albzey at googlemail.com  Fri Sep 16 00:44:35 2011
From: albzey at googlemail.com (Albert Zeyer)
Date: Fri, 16 Sep 2011 00:44:35 +0200
Subject: [Python-Dev] Meta coding in Python
Message-ID: <CAO1Q+jef1gS+Aaf9XU2WTp4TuRabNz_wkzA1uvj_JaHDV2snpA@mail.gmail.com>

Hi list,

I thought it would be nice in Python to allow some sort of meta coding
(which goes far ahead of simple function descriptors).

The most straight forward way would be to allow operations on the AST.

I wrote a small patch for CPython 2.7.1 which, for each code object,
adds the related AST of the statement to a new attribute `co_ast`.

https://github.com/albertz/CPython/commit/2670e621458fd80311fc02897b698ea2a36d494b

Some simple demonstration of what you can do with this:

https://github.com/albertz/CPython/blob/astcompile_patch/test_co_ast.py

I'm not sure wether the Python AST in this form is optimal for doing
such things, though. Maybe another representation would be more
efficient and result in simpler code for doing transformations.

Discussion about this is very welcome.

Regards,
Albert

From benjamin at python.org  Fri Sep 16 00:57:12 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Thu, 15 Sep 2011 18:57:12 -0400
Subject: [Python-Dev] Meta coding in Python
In-Reply-To: <CAO1Q+jef1gS+Aaf9XU2WTp4TuRabNz_wkzA1uvj_JaHDV2snpA@mail.gmail.com>
References: <CAO1Q+jef1gS+Aaf9XU2WTp4TuRabNz_wkzA1uvj_JaHDV2snpA@mail.gmail.com>
Message-ID: <CAPZV6o-uBG0KxpjS0=DkSBZ=QA6=E=oa0tk3OYG3iRSWxjRx8A@mail.gmail.com>

2011/9/15 Albert Zeyer <albzey at googlemail.com>:
> Hi list,
>
> I thought it would be nice in Python to allow some sort of meta coding
> (which goes far ahead of simple function descriptors).
>
> The most straight forward way would be to allow operations on the AST.
>
> I wrote a small patch for CPython 2.7.1 which, for each code object,
> adds the related AST of the statement to a new attribute `co_ast`.
>
> https://github.com/albertz/CPython/commit/2670e621458fd80311fc02897b698ea2a36d494b
>
> Some simple demonstration of what you can do with this:
>
> https://github.com/albertz/CPython/blob/astcompile_patch/test_co_ast.py
>
> I'm not sure wether the Python AST in this form is optimal for doing
> such things, though. Maybe another representation would be more
> efficient and result in simpler code for doing transformations.

It would be useful, but is a waste of memory is 99.99% of programs.


-- 
Regards,
Benjamin

From ncoghlan at gmail.com  Fri Sep 16 01:12:20 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 16 Sep 2011 09:12:20 +1000
Subject: [Python-Dev] Meta coding in Python
In-Reply-To: <CAO1Q+jef1gS+Aaf9XU2WTp4TuRabNz_wkzA1uvj_JaHDV2snpA@mail.gmail.com>
References: <CAO1Q+jef1gS+Aaf9XU2WTp4TuRabNz_wkzA1uvj_JaHDV2snpA@mail.gmail.com>
Message-ID: <CADiSq7cSHPTEPJgkamB+R1RgumppjhEfA=d0ud_28Ht_jxC5-w@mail.gmail.com>

On Fri, Sep 16, 2011 at 8:44 AM, Albert Zeyer <albzey at googlemail.com> wrote:
> Hi list,
>
> I thought it would be nice in Python to allow some sort of meta coding
> (which goes far ahead of simple function descriptors).
>
> The most straight forward way would be to allow operations on the AST.

1. This kind of suggestion is more appropriately directed to python-ideas
2. We already support this, look at the ast module and in particular
the ast.PyCF_ONLY_AST flag to the compile() builtin function. For an
example of advanced usage, look at the py.test module and it's
meta-importer that rewrites assert statements

Regards,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From martin at v.loewis.de  Fri Sep 16 07:41:21 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 16 Sep 2011 07:41:21 +0200
Subject: [Python-Dev] PEP 393: Special-casing ASCII-only strings
In-Reply-To: <CADiSq7dX6JisM-Se5mZpfdSKANKfahqKLYeMuQvH09TKd5biUA@mail.gmail.com>
References: <4E721ED1.1000001@v.loewis.de>	<201109152304.16957.victor.stinner@haypocalc.com>	<4E727081.9010307@v.loewis.de>
	<CADiSq7dX6JisM-Se5mZpfdSKANKfahqKLYeMuQvH09TKd5biUA@mail.gmail.com>
Message-ID: <4E72E181.7060805@v.loewis.de>

Am 16.09.11 00:42, schrieb Nick Coghlan:
> On Fri, Sep 16, 2011 at 7:39 AM, "Martin v. L?wis
> <martin at v.loewis.de> wrote:
>> Thinking about this, the following may work:
>>
>> - ASCIIObject: state, length, hash, wstr*, data follow
>>
>> - SingleBlockUnicode: ASCIIObject, wstr_len, utf8*, utf8_len, data
>> follow
>>
>> - UnicodeObject: SingleBlockUnicode, data pointer, no data follow
>>
>> This is essentially your proposal, except that the wstr_len is
>> dropped for ASCII strings, and that it uses nested structs.
>>
>> The single-block variants would always be "ready", the full unicode
>> object is ready only if the data pointer is set.
>
> In your "UnicodeObject" here, is the 'data pointer' the
> any/latin1/ucs2/ucs4 union from the original structure definition?

Yes, it is. I'm considering dropping the union again, since you'll
have to cast the data pointer anyway in the compact cases.

> Also, what are the constraints on the "SingleBlockUnicode"? Does it
> only hold strings that can be represented in latin1? Or can the size
>  of the individual elements be more than 1 byte?

Any size - what matters is whether the maximum character is known
at creation time (i.e. whether you've used PyUnicode_New(size, maxchar)
or PyUnicode_FromUnicode(NULL, size)). In the latter case, a Py_UNICODE
block will be allocated in wstr, and the data pointer left NULL.
Then, when PyUnicode_Ready is called, the maxmimum character is
determined in the Py_UNICODE block, and a new data block allocated -
but that will have to be a second memory block (the Py_UNICODE
block is then dropped in _Ready).

Regards,
Martin

From status at bugs.python.org  Fri Sep 16 18:07:28 2011
From: status at bugs.python.org (Python tracker)
Date: Fri, 16 Sep 2011 18:07:28 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20110916160728.38D731CC86@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-09-09 - 2011-09-16)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3019 (+19)
  closed 21757 (+30)
  total  24776 (+49)

Open issues with patches: 1295 


Issues opened (36)
==================

#12946: PyModule_GetDict() claims it can never fail, but it can
http://bugs.python.org/issue12946  opened by scoder

#12949: Documentation of PyCode_New() lacks kwonlyargcount argument
http://bugs.python.org/issue12949  opened by scoder

#12953: Function calls missing from profiler output
http://bugs.python.org/issue12953  opened by hagen

#12954: Multiprocessing logging under Windows
http://bugs.python.org/issue12954  opened by paul.j3

#12955: urllib2.build_opener().open() is not friendly to "with ... as:
http://bugs.python.org/issue12955  opened by Valery.Khamenya

#12956: builds fail when installing to --prefix with space in path nam
http://bugs.python.org/issue12956  opened by rzn8tr

#12957: mmap.resize changes memory address of mmap'd region
http://bugs.python.org/issue12957  opened by schmichael

#12958: test_socket failures on Mac OS X
http://bugs.python.org/issue12958  opened by ncoghlan

#12960: threading.Condition is not a class
http://bugs.python.org/issue12960  opened by Nikratio

#12961: unlabelled balls in boxes
http://bugs.python.org/issue12961  opened by Phillip.M.Feldman

#12962: TitledHelpFormatter and IndentedHelpFormatter are not document
http://bugs.python.org/issue12962  opened by techtonik

#12964: Two improvements for the locale aliasing engine
http://bugs.python.org/issue12964  opened by ssegvic

#12965: longobject: documentation improvements
http://bugs.python.org/issue12965  opened by skrah

#12966: cookielib.LWPCookieJar breaks on cookie values with a newline
http://bugs.python.org/issue12966  opened by paulie4

#12967: AttributeError distutils\log.py
http://bugs.python.org/issue12967  opened by Ben.thelen

#12970: os.walk() consider some symlinks as dirs instead of non-dirs
http://bugs.python.org/issue12970  opened by mmarkk

#12971: os.isdir() should contain skiplinks=False in arguments
http://bugs.python.org/issue12971  opened by mmarkk

#12972: Color prompt + readline
http://bugs.python.org/issue12972  opened by atagar1

#12973: int_pow() implementation is incorrect
http://bugs.python.org/issue12973  opened by adam at NetBSD.org

#12974: array module: deprecate '__int__' conversion support for array
http://bugs.python.org/issue12974  opened by meadori

#12976: select module: only use EVFILT_TIMER if available (kqueue back
http://bugs.python.org/issue12976  opened by bsiegert

#12977: socket.socket.setblocking does not raise exception if no data 
http://bugs.python.org/issue12977  opened by Florian.Ludwig

#12978: Figure out extended attributes on BSDs
http://bugs.python.org/issue12978  opened by benjamin.peterson

#12979: tkinter.font.Font object not usable as font option
http://bugs.python.org/issue12979  opened by ilikepython

#12981: rewrite multiprocessing (senfd|recvfd) in Python
http://bugs.python.org/issue12981  opened by neologix

#12982: .pyo file cannot be imported
http://bugs.python.org/issue12982  opened by lebigot

#12983: byte string literals with invalid hex escape codes raise Value
http://bugs.python.org/issue12983  opened by ned.deily

#12984: XML NamedNodeMap ( attribName in NamedNodeMap fails )
http://bugs.python.org/issue12984  opened by spolematt

#12985: Check signed arithmetic overflow in ./configure
http://bugs.python.org/issue12985  opened by skrah

#12986: Using getrandbits() in uuid.uuid4() is faster and more readabl
http://bugs.python.org/issue12986  opened by mattchaput

#12987: Demo/scripts/newslist.py has non-commercial license clause
http://bugs.python.org/issue12987  opened by matejcik

#12988: IDLE on Win7 crashes when saving to Documents Library
http://bugs.python.org/issue12988  opened by Brian.Gernhardt

#12989: Consistently handle path separator in Py_GetPath on Windows
http://bugs.python.org/issue12989  opened by Nam.Nguyen

#12990: launcher can't work on path including tradition chinese char
http://bugs.python.org/issue12990  opened by Ricky.Teng

#12993: prepared statements in sqlite3 module
http://bugs.python.org/issue12993  opened by Mayur.&.Angela.Patel-Lam

#12994: cx_Oracle failed to load in newly build python 2.7.1
http://bugs.python.org/issue12994  opened by wah meng


Most recent 15 issues with no replies (15)
==========================================

#12994: cx_Oracle failed to load in newly build python 2.7.1
http://bugs.python.org/issue12994

#12993: prepared statements in sqlite3 module
http://bugs.python.org/issue12993

#12990: launcher can't work on path including tradition chinese char
http://bugs.python.org/issue12990

#12989: Consistently handle path separator in Py_GetPath on Windows
http://bugs.python.org/issue12989

#12988: IDLE on Win7 crashes when saving to Documents Library
http://bugs.python.org/issue12988

#12987: Demo/scripts/newslist.py has non-commercial license clause
http://bugs.python.org/issue12987

#12986: Using getrandbits() in uuid.uuid4() is faster and more readabl
http://bugs.python.org/issue12986

#12984: XML NamedNodeMap ( attribName in NamedNodeMap fails )
http://bugs.python.org/issue12984

#12983: byte string literals with invalid hex escape codes raise Value
http://bugs.python.org/issue12983

#12979: tkinter.font.Font object not usable as font option
http://bugs.python.org/issue12979

#12977: socket.socket.setblocking does not raise exception if no data 
http://bugs.python.org/issue12977

#12972: Color prompt + readline
http://bugs.python.org/issue12972

#12971: os.isdir() should contain skiplinks=False in arguments
http://bugs.python.org/issue12971

#12966: cookielib.LWPCookieJar breaks on cookie values with a newline
http://bugs.python.org/issue12966

#12965: longobject: documentation improvements
http://bugs.python.org/issue12965


Most recent 15 issues waiting for review (15)
=============================================

#12989: Consistently handle path separator in Py_GetPath on Windows
http://bugs.python.org/issue12989

#12986: Using getrandbits() in uuid.uuid4() is faster and more readabl
http://bugs.python.org/issue12986

#12985: Check signed arithmetic overflow in ./configure
http://bugs.python.org/issue12985

#12981: rewrite multiprocessing (senfd|recvfd) in Python
http://bugs.python.org/issue12981

#12973: int_pow() implementation is incorrect
http://bugs.python.org/issue12973

#12970: os.walk() consider some symlinks as dirs instead of non-dirs
http://bugs.python.org/issue12970

#12965: longobject: documentation improvements
http://bugs.python.org/issue12965

#12943: tokenize: add python -m tokenize support back
http://bugs.python.org/issue12943

#12936: armv5tejl segfaults: sched_setaffinity() vs. pthread_setaffini
http://bugs.python.org/issue12936

#12931: xmlrpclib confuses unicode and string
http://bugs.python.org/issue12931

#12930: reindent.py inserts spaces in multiline literals
http://bugs.python.org/issue12930

#12919: Control what module is imported first
http://bugs.python.org/issue12919

#12911: Expose a private accumulator C API
http://bugs.python.org/issue12911

#12903: test_io.test_interrupte[r]d* blocks on OpenBSD
http://bugs.python.org/issue12903

#12901: Nest class/methods directives in documentation
http://bugs.python.org/issue12901


Top 10 most discussed issues (10)
=================================

#12936: armv5tejl segfaults: sched_setaffinity() vs. pthread_setaffini
http://bugs.python.org/issue12936  26 msgs

#11457: Expose nanosecond precision from system calls
http://bugs.python.org/issue11457  17 msgs

#12973: int_pow() implementation is incorrect
http://bugs.python.org/issue12973  16 msgs

#1172711: long long support for array module
http://bugs.python.org/issue1172711  11 msgs

#8822: datetime naive and aware types should have a well-defined defi
http://bugs.python.org/issue8822  10 msgs

#12301: Use :role:`sys.thing` instead of ``sys.thing`` throughout
http://bugs.python.org/issue12301   7 msgs

#12945: ctypes works incorrectly with _swappedbytes_ = 1
http://bugs.python.org/issue12945   7 msgs

#6715: xz compressor support
http://bugs.python.org/issue6715   6 msgs

#12913: Add a debugging howto
http://bugs.python.org/issue12913   6 msgs

#12981: rewrite multiprocessing (senfd|recvfd) in Python
http://bugs.python.org/issue12981   6 msgs


Issues closed (27)
==================

#7201: double Endian problem and more on arm
http://bugs.python.org/issue7201  closed by mark.dickinson

#9871: IDLE 3 crashes processing byte strings with invalid hex escape
http://bugs.python.org/issue9871  closed by ned.deily

#11149: [PATCH] Configure should enable -fwrapv for clang
http://bugs.python.org/issue11149  closed by skrah

#12299: Stop documenting functions added by site as builtins
http://bugs.python.org/issue12299  closed by eric.araujo

#12306: zlib: Expose zlibVersion to query runtime version of zlib
http://bugs.python.org/issue12306  closed by nadeem.vawda

#12483: CThunkObject_dealloc should call PyObject_GC_UnTrack?
http://bugs.python.org/issue12483  closed by meadori

#12896: Recommended location of the interpreter for Python 3
http://bugs.python.org/issue12896  closed by eric.araujo

#12914: Add cram function to textwrap
http://bugs.python.org/issue12914  closed by rhettinger

#12917: Make visiblename and allmethods functions public
http://bugs.python.org/issue12917  closed by rhettinger

#12918: New module for terminal utilities
http://bugs.python.org/issue12918  closed by eric.araujo

#12924: Missing call to quote_plus() in test_urllib.test_default_quoti
http://bugs.python.org/issue12924  closed by python-dev

#12935: Typo in findertools.py
http://bugs.python.org/issue12935  closed by ned.deily

#12940: Cmd example using turtle left vs. right doc-bug
http://bugs.python.org/issue12940  closed by ezio.melotti

#12941: add random.pop()
http://bugs.python.org/issue12941  closed by terry.reedy

#12947: Examples in library/doctest.html lack the flags
http://bugs.python.org/issue12947  closed by eric.araujo

#12948: multiprocessing test failures can hang the buildbots
http://bugs.python.org/issue12948  closed by jcea

#12950: multiprocessing "test_fd_transfer" fails under OpenIndiana
http://bugs.python.org/issue12950  closed by python-dev

#12951: List behavior is different
http://bugs.python.org/issue12951  closed by ezio.melotti

#12952: Solaris/Illumos (OpenIndiana) Scheduling policies
http://bugs.python.org/issue12952  closed by python-dev

#12959: Add 'ChainMap' to collections.__all__
http://bugs.python.org/issue12959  closed by python-dev

#12963: PyLong_AsSize_t returns (unsigned long)-1
http://bugs.python.org/issue12963  closed by skrah

#12968: vvccc???????????????
http://bugs.python.org/issue12968  closed by benjamin.peterson

#12969: Command 'open(0,"wb").close()' cause crash of Python interpret
http://bugs.python.org/issue12969  closed by jcea

#12975: spam
http://bugs.python.org/issue12975  closed by neologix

#12980: segfault in test_json on AMD64 FreeBSD 8.2 2.7
http://bugs.python.org/issue12980  closed by skrah

#12991: Python 64-bit build on HP Itanium - Executable built successfu
http://bugs.python.org/issue12991  closed by skrah

#12992: Python build finished, but the necessary bits to build these m
http://bugs.python.org/issue12992  closed by ezio.melotti

From chris at simplistix.co.uk  Fri Sep 16 19:58:46 2011
From: chris at simplistix.co.uk (Chris Withers)
Date: Fri, 16 Sep 2011 18:58:46 +0100
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <4E724498.2040603@voidspace.org.uk>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>	<loom.20110818T002304-368@post.gmane.org>	<4E4D5992.7070603@netwok.org>
	<CAGSi+Q42HpxfsVMcyLRQAQsVEBuGXcim-jkg17Dt3f15Rij3hQ@mail.gmail.com>
	<4E6F7D6B.9040709@netwok.org>
	<4E6F861F.5020904@voidspace.org.uk> <4E72266F.106@netwok.org>
	<4E724498.2040603@voidspace.org.uk>
Message-ID: <4E738E56.20709@simplistix.co.uk>

On 15/09/2011 19:31, Michael Foord wrote:
> The current tools are a real pain for versioning anyway. If your pypi
> page even *links* to a page that offers an alpha or beta (in development
> version) for download then both pip and easy_install will fetch that, in
> preference to the most recent version on pypi. So yes, I agree there is
> room for improvement in the current tools. Hopefully distutils2 will fix
> that. ;-)

I'm pretty sure recent releases of zc.buildout prefer "final" releases 
by default ;-)

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk

From albzey at googlemail.com  Sat Sep 17 17:05:48 2011
From: albzey at googlemail.com (Albert Zeyer)
Date: Sat, 17 Sep 2011 17:05:48 +0200
Subject: [Python-Dev] Persistent Python - a la Smalltalk
Message-ID: <CAO1Q+jf6fgEJR97xvFFZcvmtTr4HtGB_dLGJRZ192_OMOAQtMg@mail.gmail.com>

Hi,

I was thinking about a persistent Python interpreter system. I.e. you
start a Python interpreter instance and you load and create all your
objects, classes and code in there (or load it in there from other
files).

The basic idea is that you wont restart your Python script, you would
always modify it on-the-fly. Or a bit less extreme: You would at least
have the possibility with this to do this (like just doing minor
changes). Also, if your PC halts for whatever reason, you can continue
your Python script after a restart.

This goes along my other recent proposal to store the AST of
statements in the related code objects
(http://thread.gmane.org/gmane.comp.python.devel/126754). An internal
editor could then edit this AST and recompile the code object.

For the persistance, there would be an image file containing all the
Python objects.

All in all, much like most Smalltalk systems.

---

Has anyone done something like this already?

---

There are a few implementation details which are not trivial and there
doesn't seem to be straight forward solutions, e.g. most generally:

* How to implement the persistance?
* How to handle image compatibility between CPython updates? Even possible?

Regards,
Albert

From guido at python.org  Sat Sep 17 17:17:38 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 17 Sep 2011 08:17:38 -0700
Subject: [Python-Dev] Persistent Python - a la Smalltalk
In-Reply-To: <CAO1Q+jf6fgEJR97xvFFZcvmtTr4HtGB_dLGJRZ192_OMOAQtMg@mail.gmail.com>
References: <CAO1Q+jf6fgEJR97xvFFZcvmtTr4HtGB_dLGJRZ192_OMOAQtMg@mail.gmail.com>
Message-ID: <CAP7+vJJux2dzBsPfMj+RwMhBan6GbVKpMYuwrQcTwWxNWV8hNA@mail.gmail.com>

[BCC python-dev, +python-ideas]

Funny you should mention this. ABC, Python's predecessor, worked like
this. However, it didn't work out very well. So, I'd say you're about
30 years too late with your idea... :-(

--Guido

On Sat, Sep 17, 2011 at 8:05 AM, Albert Zeyer <albzey at googlemail.com> wrote:
> Hi,
>
> I was thinking about a persistent Python interpreter system. I.e. you
> start a Python interpreter instance and you load and create all your
> objects, classes and code in there (or load it in there from other
> files).
>
> The basic idea is that you wont restart your Python script, you would
> always modify it on-the-fly. Or a bit less extreme: You would at least
> have the possibility with this to do this (like just doing minor
> changes). Also, if your PC halts for whatever reason, you can continue
> your Python script after a restart.
>
> This goes along my other recent proposal to store the AST of
> statements in the related code objects
> (http://thread.gmane.org/gmane.comp.python.devel/126754). An internal
> editor could then edit this AST and recompile the code object.
>
> For the persistance, there would be an image file containing all the
> Python objects.
>
> All in all, much like most Smalltalk systems.
>
> ---
>
> Has anyone done something like this already?
>
> ---
>
> There are a few implementation details which are not trivial and there
> doesn't seem to be straight forward solutions, e.g. most generally:
>
> * How to implement the persistance?
> * How to handle image compatibility between CPython updates? Even possible?
>
> Regards,
> Albert
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (python.org/~guido)

From ethan at stoneleaf.us  Sat Sep 17 19:01:30 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sat, 17 Sep 2011 10:01:30 -0700
Subject: [Python-Dev] Persistent Python - a la Smalltalk
In-Reply-To: <CAO1Q+jf6fgEJR97xvFFZcvmtTr4HtGB_dLGJRZ192_OMOAQtMg@mail.gmail.com>
References: <CAO1Q+jf6fgEJR97xvFFZcvmtTr4HtGB_dLGJRZ192_OMOAQtMg@mail.gmail.com>
Message-ID: <4E74D26A.4030309@stoneleaf.us>

Albert Zeyer wrote:
> I was thinking about a persistent Python interpreter system.

python-dev is for developing the next version of Python (3.3 at this 
point).  Questions like this should go to python-list or python-ideas.

~Ethan~

From godson.g at gmail.com  Sun Sep 18 10:55:25 2011
From: godson.g at gmail.com (Godson Gera)
Date: Sun, 18 Sep 2011 14:25:25 +0530
Subject: [Python-Dev] Persistent Python - a la Smalltalk
In-Reply-To: <CAO1Q+jf6fgEJR97xvFFZcvmtTr4HtGB_dLGJRZ192_OMOAQtMg@mail.gmail.com>
References: <CAO1Q+jf6fgEJR97xvFFZcvmtTr4HtGB_dLGJRZ192_OMOAQtMg@mail.gmail.com>
Message-ID: <CAPTXd0osZW3G_OXQOhVMEMqVNYNmSKA7eH9s2f82fiU3u1YExw@mail.gmail.com>

Twisted has some feature like that implemented using pickles or some thing.
It meant to save the state of the program during restart. I am not sure if
that's what you are after. http://twistedmatrix.com
On 17 Sep 2011 20:44, "Albert Zeyer" <albzey at googlemail.com> wrote:
> Hi,
>
> I was thinking about a persistent Python interpreter system. I.e. you
> start a Python interpreter instance and you load and create all your
> objects, classes and code in there (or load it in there from other
> files).
>
> The basic idea is that you wont restart your Python script, you would
> always modify it on-the-fly. Or a bit less extreme: You would at least
> have the possibility with this to do this (like just doing minor
> changes). Also, if your PC halts for whatever reason, you can continue
> your Python script after a restart.
>
> This goes along my other recent proposal to store the AST of
> statements in the related code objects
> (http://thread.gmane.org/gmane.comp.python.devel/126754). An internal
> editor could then edit this AST and recompile the code object.
>
> For the persistance, there would be an image file containing all the
> Python objects.
>
> All in all, much like most Smalltalk systems.
>
> ---
>
> Has anyone done something like this already?
>
> ---
>
> There are a few implementation details which are not trivial and there
> doesn't seem to be straight forward solutions, e.g. most generally:
>
> * How to implement the persistance?
> * How to handle image compatibility between CPython updates? Even
possible?
>
> Regards,
> Albert
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
http://mail.python.org/mailman/options/python-dev/godson.g%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110918/723fb1f3/attachment.html>

From zbyszek at in.waw.pl  Sun Sep 18 10:23:25 2011
From: zbyszek at in.waw.pl (Zbigniew =?UTF-8?B?SsSZZHJ6ZWpld3NraS1Tem1law==?=)
Date: Sun, 18 Sep 2011 10:23:25 +0200
Subject: [Python-Dev] Persistent Python - a la Smalltalk
References: <CAO1Q+jf6fgEJR97xvFFZcvmtTr4HtGB_dLGJRZ192_OMOAQtMg@mail.gmail.com>
	<CAP7+vJJux2dzBsPfMj+RwMhBan6GbVKpMYuwrQcTwWxNWV8hNA__30811.7130575285$1316273154$gmane$org@mail.gmail.com>
Message-ID: <j549pu$1n1$1@dough.gmane.org>

Guido van Rossum wrote:

> [BCC python-dev, +python-ideas]
> 
> Funny you should mention this. ABC, Python's predecessor, worked like
> this. However, it didn't work out very well. So, I'd say you're about
> 30 years too late with your idea... :-(
Well, the newly developed IPython notebook [1] is something along those 
lines. So he's not late, he's a little bit early :)

-- Zbyszek

http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html


From smiwa.egon at googlemail.com  Tue Sep 20 15:58:35 2011
From: smiwa.egon at googlemail.com (Egon Smiwa)
Date: Tue, 20 Sep 2011 15:58:35 +0200
Subject: [Python-Dev] Unicode identifiers
Message-ID: <4E789C0B.7060709@googlemail.com>

Hi all,
I wanted to implement quantity objects in a software,
which can be used with user-friendly expressions like:
money = 3 * ?, where Euro is a special quantity object
But now I realized, Python does not allow currency
characters in names, although they can be very useful.
Is there a really convincing argument against the inclusion?

Thank you!


From benjamin at python.org  Tue Sep 20 16:01:11 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Tue, 20 Sep 2011 10:01:11 -0400
Subject: [Python-Dev] Unicode identifiers
In-Reply-To: <4E789C0B.7060709@googlemail.com>
References: <4E789C0B.7060709@googlemail.com>
Message-ID: <CAPZV6o-HtBf5CtWAWPYc4nb5k1oFJdTkgJ=veEesOA5aObgAMA@mail.gmail.com>

2011/9/20 Egon Smiwa <smiwa.egon at googlemail.com>:
> Hi all,
> I wanted to implement quantity objects in a software,
> which can be used with user-friendly expressions like:
> money = 3 * ?, where Euro is a special quantity object
> But now I realized, Python does not allow currency
> characters in names, although they can be very useful.
> Is there a really convincing argument against the inclusion?

It's a violation of http://unicode.org/reports/tr31/


-- 
Regards,
Benjamin

From stefan at bytereef.org  Wed Sep 21 18:02:27 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Wed, 21 Sep 2011 18:02:27 +0200
Subject: [Python-Dev] [Python-checkins] cpython: Issue #1172711: Add
	'long long'	support to the array module.
In-Reply-To: <4E79E5C9.9080204@gmail.com>
References: <E1R6BEM-0004dG-4V@dinsdale.python.org>
	<4E79E5C9.9080204@gmail.com>
Message-ID: <20110921160227.GA19702@sleipnir.bytereef.org>

Ezio Melotti <ezio.melotti at gmail.com> wrote:
>> + at unittest.skipIf(not have_long_long, 'need long long support')
>
> I think this would read better with skipUnless and s/have/has/:
>
> @unittest.skipUnless(HAS_LONG_LONG, 'need long long support')

skipUnless() is perhaps a bit cleaner, but have_long_long is pretty
established elsewhere (for example in pyport.h).


Stefan Krah


From g.brandl at gmx.net  Wed Sep 21 18:16:58 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 21 Sep 2011 18:16:58 +0200
Subject: [Python-Dev] cpython: Issue #1172711: Add 'long long' support
 to the array module.
In-Reply-To: <4E79E5C9.9080204@gmail.com>
References: <E1R6BEM-0004dG-4V@dinsdale.python.org>
	<4E79E5C9.9080204@gmail.com>
Message-ID: <j5d2j2$t8u$1@dough.gmane.org>

Am 21.09.2011 15:25, schrieb Ezio Melotti:

>> @@ -1205,6 +1214,18 @@
>>       minitemsize = 4
>>   tests.append(UnsignedLongTest)
>>
>> + at unittest.skipIf(not have_long_long, 'need long long support')
> 
> I think this would read better with skipUnless and s/have/has/:
> 
> @unittest.skipUnless(HAS_LONG_LONG, 'need long long support')

I don't think so. "skip if not" reads pretty well for me, while I
always have to think twice about "unless" -- may be a non-native-
speaker thing.

Georg


From benjamin at python.org  Wed Sep 21 18:21:53 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Wed, 21 Sep 2011 12:21:53 -0400
Subject: [Python-Dev] cpython: Issue #1172711: Add 'long long' support
 to the array module.
In-Reply-To: <j5d2j2$t8u$1@dough.gmane.org>
References: <E1R6BEM-0004dG-4V@dinsdale.python.org>
	<4E79E5C9.9080204@gmail.com> <j5d2j2$t8u$1@dough.gmane.org>
Message-ID: <CAPZV6o9qGuxvRtxb0hn=JNZOGU9pgwFHoKygNQFPNKr_F3HZ9Q@mail.gmail.com>

2011/9/21 Georg Brandl <g.brandl at gmx.net>:
> Am 21.09.2011 15:25, schrieb Ezio Melotti:
>
>>> @@ -1205,6 +1214,18 @@
>>> ? ? ? minitemsize = 4
>>> ? tests.append(UnsignedLongTest)
>>>
>>> + at unittest.skipIf(not have_long_long, 'need long long support')
>>
>> I think this would read better with skipUnless and s/have/has/:
>>
>> @unittest.skipUnless(HAS_LONG_LONG, 'need long long support')
>
> I don't think so. "skip if not" reads pretty well for me, while I
> always have to think twice about "unless" -- may be a non-native-
> speaker thing.

You might also not program in Ruby enough. :)

-- 
Regards,
Benjamin

From meadori at gmail.com  Wed Sep 21 18:40:55 2011
From: meadori at gmail.com (Meador Inge)
Date: Wed, 21 Sep 2011 11:40:55 -0500
Subject: [Python-Dev] [Python-checkins] cpython: Issue #1172711: Add
 'long long' support to the array module.
In-Reply-To: <20110921160227.GA19702@sleipnir.bytereef.org>
References: <E1R6BEM-0004dG-4V@dinsdale.python.org>
	<4E79E5C9.9080204@gmail.com>
	<20110921160227.GA19702@sleipnir.bytereef.org>
Message-ID: <CAK1QooqT3WZqzfxM5ZhQN1j8k5epKAtqF4tu6WHQwto69MzjFg@mail.gmail.com>

On Wed, Sep 21, 2011 at 11:02 AM, Stefan Krah <stefan at bytereef.org> wrote:

> Ezio Melotti <ezio.melotti at gmail.com> wrote:
>>> + at unittest.skipIf(not have_long_long, 'need long long support')
>>
>> I think this would read better with skipUnless and s/have/has/:
>>
>> @unittest.skipUnless(HAS_LONG_LONG, 'need long long support')
>
> skipUnless() is perhaps a bit cleaner, but have_long_long is pretty
> established elsewhere (for example in pyport.h).

I agree with Stefan on the have_long_long part.  This is what is used
in the array module
code, struct, ctypes, etc ... (via pyport.h as Stefan mentioned).  As
for the unless/if,
I am OK with the 'if'.  'unless' always causes a double-take for me.
Personal preference I guess.

-- Meador

From merwok at netwok.org  Wed Sep 21 18:50:25 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Wed, 21 Sep 2011 18:50:25 +0200
Subject: [Python-Dev] Packaging in Python 2 anyone ?
In-Reply-To: <4E6F7D6B.9040709@netwok.org>
References: <CAGSi+Q5DrhG=OvUmbhU=BTdvky1imsezrc3CcWcQj39LhY6UoA@mail.gmail.com>	<CADiSq7dvekwBkb9Y2M6nBJrTfO0YOrf9y0yV6Z=uoMiCJbFS7Q@mail.gmail.com>	<CAGSi+Q4u8ALpr2v3yWujadxFmwJ7jx=HhNBFC+pW-r-Vo7CVdQ@mail.gmail.com>	<loom.20110818T002304-368@post.gmane.org>	<4E4D5992.7070603@netwok.org>	<CAGSi+Q42HpxfsVMcyLRQAQsVEBuGXcim-jkg17Dt3f15Rij3hQ@mail.gmail.com>
	<4E6F7D6B.9040709@netwok.org>
Message-ID: <4E7A15D1.2090402@netwok.org>

Hi,

I caught Tarek on IRC and forced him to answer my questions.  Here are
the latest news:

- I have cleaned up and synchronized the distutils2 codebase with
packaging in 3.3.  All features and bugs are now identical.  The test
suite runs with Python 2.4 to 2.7; there are three or four test failures
(linux, with threads, UCS4, not shared).  Please clone, build (we
backported hashlib for 2.4), test and file bugs!  We?ll make an alpha4
as soon as all tests pass.

- I have started work in a named branch to provide distutils2 for Python
3.1 and 3.2.  Patches will flow between packaging, distutils2 and
distutils2-py3.  I?ll start a discussion on catalog-sig to improve
support of parallel releases of 2.x and 3.x-compatible projects.

- The docs in the d2 repo will be removed; people will go to
docs.python.org and mentally convert packaging to distutils2.  I?ll
update the PyPI page.

Cheers

From stephen at xemacs.org  Wed Sep 21 19:02:11 2011
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 22 Sep 2011 02:02:11 +0900
Subject: [Python-Dev] cpython: Issue #1172711: Add 'long long' support
 to the array module.
In-Reply-To: <j5d2j2$t8u$1@dough.gmane.org>
References: <E1R6BEM-0004dG-4V@dinsdale.python.org>
	<4E79E5C9.9080204@gmail.com> <j5d2j2$t8u$1@dough.gmane.org>
Message-ID: <87pqitbzho.fsf@uwakimon.sk.tsukuba.ac.jp>

Georg Brandl writes:

 > I don't think so. "skip if not" reads pretty well for me, while I
 > always have to think twice about "unless" -- may be a non-native-
 > speaker thing.

FWIW, speaking as one native speaker, I'm not sure about that.  "do ...
if not condition" doesn't bother me, whether I think of the condition
as an exception or as the normal state of affairs.  I find "do ...
unless condition" to be quite awkward if the condition is a normal state.


From fuzzyman at voidspace.org.uk  Wed Sep 21 20:08:22 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Wed, 21 Sep 2011 19:08:22 +0100
Subject: [Python-Dev] cpython: Issue #1172711: Add 'long long' support
 to the array module.
In-Reply-To: <87pqitbzho.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <E1R6BEM-0004dG-4V@dinsdale.python.org>
	<4E79E5C9.9080204@gmail.com> <j5d2j2$t8u$1@dough.gmane.org>
	<87pqitbzho.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4E7A2816.8080204@voidspace.org.uk>

On 21/09/2011 18:02, Stephen J. Turnbull wrote:
> Georg Brandl writes:
>
>   >  I don't think so. "skip if not" reads pretty well for me, while I
>   >  always have to think twice about "unless" -- may be a non-native-
>   >  speaker thing.
>
> FWIW, speaking as one native speaker, I'm not sure about that.  "do ...
> if not condition" doesn't bother me, whether I think of the condition
> as an exception or as the normal state of affairs.  I find "do ...
> unless condition" to be quite awkward if the condition is a normal state.

I'm not a big fan of skipUnless, but there you go. I find "skip if not" 
readable too and always have to "work out" what skipUnless means. It's 
probably just that "if" and "if not" are such Python idioms and "unless" 
isn't.

Michael

> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>


-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html


From ezio.melotti at gmail.com  Wed Sep 21 22:43:58 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Wed, 21 Sep 2011 23:43:58 +0300
Subject: [Python-Dev] cpython: Issue #1172711: Add 'long long' support
 to the array module.
In-Reply-To: <4E7A2816.8080204@voidspace.org.uk>
References: <E1R6BEM-0004dG-4V@dinsdale.python.org>
	<4E79E5C9.9080204@gmail.com> <j5d2j2$t8u$1@dough.gmane.org>
	<87pqitbzho.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4E7A2816.8080204@voidspace.org.uk>
Message-ID: <4E7A4C8E.3090802@gmail.com>

On 21/09/2011 21.08, Michael Foord wrote:
> On 21/09/2011 18:02, Stephen J. Turnbull wrote:
>> Georg Brandl writes:
>>
>> >  I don't think so. "skip if not" reads pretty well for me, while I
>> >  always have to think twice about "unless" -- may be a non-native-
>> >  speaker thing.
>>
>> FWIW, speaking as one native speaker, I'm not sure about that.  "do ...
>> if not condition" doesn't bother me, whether I think of the condition
>> as an exception or as the normal state of affairs.  I find "do ...
>> unless condition" to be quite awkward if the condition is a normal 
>> state.
>
> I'm not a big fan of skipUnless, but there you go. I find "skip if 
> not" readable too and always have to "work out" what skipUnless means. 
> It's probably just that "if" and "if not" are such Python idioms and 
> "unless" isn't.

I don't find it too readable in other contexts (e.g. failUnless), but I 
probably got used to skipUnless with the idiom:
try:
     import foo
except ImportError:
     foo = None

@skipUnless(foo, 'requires foo')
...

FWIW in Lib/test/support.py we have a "skip_unless_symlink", but the 
other two skipUnless have more readable names: "requires_zlib" and 
"requires_IEEE_754".  In Lib/test/ "skipUnless" is used about 250 times, 
"skipIf" about 100.

Best Regards,
Ezio Melotti

>
> Michael
>
>


From merwok at netwok.org  Fri Sep 23 16:35:27 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Fri, 23 Sep 2011 16:35:27 +0200
Subject: [Python-Dev] [Python-checkins] cpython (2.7): Issue #12931:
 xmlrpclib now encodes Unicode URI to ISO-8859-1, instead of
In-Reply-To: <E1R6sjV-0007BN-JZ@dinsdale.python.org>
References: <E1R6sjV-0007BN-JZ@dinsdale.python.org>
Message-ID: <4E7C992F.5030601@netwok.org>

Hi Victor,

> summary:
>   Issue #12931: xmlrpclib now encodes Unicode URI to ISO-8859-1, instead of
> failing with a UnicodeDecodeError.
> 
> diff --git a/Lib/test/test_xmlrpc.py b/Lib/test/test_xmlrpc.py
> --- a/Lib/test/test_xmlrpc.py
> +++ b/Lib/test/test_xmlrpc.py
> @@ -472,6 +472,9 @@
>                  # protocol error; provide additional information in test output
>                  self.fail("%s\n%s" % (e, getattr(e, "headers", "")))
>  
> +    def test_unicode_host(self):
> +        server = xmlrpclib.ServerProxy(u"http://%s:%d/RPC2"%(ADDR, PORT))

Spaces around the modulo operator would have been nice here.
Readability counts :)

From le.mognon at gmail.com  Fri Sep 23 17:12:53 2011
From: le.mognon at gmail.com (Martin Goudreau)
Date: Fri, 23 Sep 2011 11:12:53 -0400
Subject: [Python-Dev] genious hack in python
Message-ID: <CANEr3EhKza6ta-EFHg72zqFipL1BaZzRUoKNnu1Q8it6iZxHYg@mail.gmail.com>

 Hello Dev Teem,

Guido told me to send you this idea...

Improving productivity is one of my Strength. Please check a very small
module i'v made for improving the debugger traceback. See the
pybettererror.py on sourceforge:
http://pybettererror.sourceforge.net/projet.html
It's hard to find something to complain about in python. This one was a too
good idea to keep for myself.

Thanks

Martin Goudreau from Montreal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110923/0682cfc6/attachment.html>

From phd at phdru.name  Fri Sep 23 17:54:51 2011
From: phd at phdru.name (Oleg Broytman)
Date: Fri, 23 Sep 2011 19:54:51 +0400
Subject: [Python-Dev] genious hack in python
In-Reply-To: <CANEr3EhKza6ta-EFHg72zqFipL1BaZzRUoKNnu1Q8it6iZxHYg@mail.gmail.com>
References: <CANEr3EhKza6ta-EFHg72zqFipL1BaZzRUoKNnu1Q8it6iZxHYg@mail.gmail.com>
Message-ID: <20110923155451.GC21909@iskra.aviel.ru>

Hi!

On Fri, Sep 23, 2011 at 11:12:53AM -0400, Martin Goudreau wrote:
> Please check a very small
> module i'v made for improving the debugger traceback. See the
> pybettererror.py on sourceforge:
> http://pybettererror.sourceforge.net/projet.html

   Why do this in sys.stderr and not by monkey-patching traceback.py,
probably format_list and format_exception_only?

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From status at bugs.python.org  Fri Sep 23 18:07:28 2011
From: status at bugs.python.org (Python tracker)
Date: Fri, 23 Sep 2011 18:07:28 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20110923160728.E68E11CA8F@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-09-16 - 2011-09-23)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3030 (+11)
  closed 21788 (+31)
  total  24818 (+42)

Open issues with patches: 1299 


Issues opened (34)
==================

#11686: Update of some email/ __all__ lists
http://bugs.python.org/issue11686  reopened by r.david.murray

#11780: email.encoders are broken
http://bugs.python.org/issue11780  reopened by r.david.murray

#12991: Python 64-bit build on HP Itanium - Executable built successfu
http://bugs.python.org/issue12991  reopened by wah meng

#12997: sqlite3: PRAGMA foreign_keys = ON doesn't work
http://bugs.python.org/issue12997  opened by Mark.Bucciarelli

#12998: Memory leak with CTypes Structure
http://bugs.python.org/issue12998  opened by a01

#12999: _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED usage on Solaris
http://bugs.python.org/issue12999  opened by neologix

#13000: unhandled exception at  install
http://bugs.python.org/issue13000  opened by jorge.seifert

#13001: test_socket.testRecvmsgTrunc failure on FreeBSD 7.2 buildbot
http://bugs.python.org/issue13001  opened by neologix

#13004: pprint: add option to truncate sequences
http://bugs.python.org/issue13004  opened by terry.reedy

#13008: syntax error when pasting valid snippet into console without e
http://bugs.python.org/issue13008  opened by techtonik

#13009: Remove documentation in distutils2 repo
http://bugs.python.org/issue13009  opened by eric.araujo

#13011: Frozen programs require the original build directory in order 
http://bugs.python.org/issue13011  opened by malcolmp

#13012: Allow keyword argument in str.splitlines()
http://bugs.python.org/issue13012  opened by mark.dickinson

#13013: _ctypes.c: refleak
http://bugs.python.org/issue13013  opened by Suman.Saha

#13014: _ssl.c: refleak
http://bugs.python.org/issue13014  opened by Suman.Saha

#13015: _collectionsmodule.c: refleak
http://bugs.python.org/issue13015  opened by Suman.Saha

#13016: selectmodule.c: refleak
http://bugs.python.org/issue13016  opened by Suman.Saha

#13017: pyexpat.c: refleak
http://bugs.python.org/issue13017  opened by Suman.Saha

#13018: dictobject.c: refleak
http://bugs.python.org/issue13018  opened by Suman.Saha

#13019: bytearrayobject.c: refleak
http://bugs.python.org/issue13019  opened by Suman.Saha

#13020: structseq.c: refleak
http://bugs.python.org/issue13020  opened by Suman.Saha

#13023: argparse should allow displaying argument default values in ad
http://bugs.python.org/issue13023  opened by denilsonsa

#13024: cgitb uses stdout encoding
http://bugs.python.org/issue13024  opened by haypo

#13025: mimetypes should read the rule file using UTF-8, not the local
http://bugs.python.org/issue13025  opened by haypo

#13026: Dis module - documentation of MAKE_FUNCTION
http://bugs.python.org/issue13026  opened by arno

#13027: python 2.6.6 interpreter core dumps on modules command from he
http://bugs.python.org/issue13027  opened by Balachandran.Sivakumar

#13028: python wastes linux users time by checking for dylib on each d
http://bugs.python.org/issue13028  opened by fzvqedi

#13029: test_strptime fails on Windows 7 french
http://bugs.python.org/issue13029  opened by haypo

#13030: Be more generic when identifying the Windows main dir in insta
http://bugs.python.org/issue13030  opened by sandro.tosi

#13031: [PATCH] small speed-up for tarfile.py when unzipping tarballs
http://bugs.python.org/issue13031  opened by jpeel

#13032: h2py.py can fail with UnicodeDecodeError
http://bugs.python.org/issue13032  opened by Arfrever

#13033: recursive chown for shutils
http://bugs.python.org/issue13033  opened by Low.Kian.Seong

#13034: Python does not read Alternative Subject Names from SSL certif
http://bugs.python.org/issue13034  opened by atrasatti

#13035: "maintainer" value clear the "author" value when registering
http://bugs.python.org/issue13035  opened by jab


Most recent 15 issues with no replies (15)
==========================================

#13035: "maintainer" value clear the "author" value when registering
http://bugs.python.org/issue13035

#13034: Python does not read Alternative Subject Names from SSL certif
http://bugs.python.org/issue13034

#13032: h2py.py can fail with UnicodeDecodeError
http://bugs.python.org/issue13032

#13030: Be more generic when identifying the Windows main dir in insta
http://bugs.python.org/issue13030

#13025: mimetypes should read the rule file using UTF-8, not the local
http://bugs.python.org/issue13025

#13024: cgitb uses stdout encoding
http://bugs.python.org/issue13024

#13023: argparse should allow displaying argument default values in ad
http://bugs.python.org/issue13023

#13019: bytearrayobject.c: refleak
http://bugs.python.org/issue13019

#13018: dictobject.c: refleak
http://bugs.python.org/issue13018

#13017: pyexpat.c: refleak
http://bugs.python.org/issue13017

#13016: selectmodule.c: refleak
http://bugs.python.org/issue13016

#13015: _collectionsmodule.c: refleak
http://bugs.python.org/issue13015

#13011: Frozen programs require the original build directory in order 
http://bugs.python.org/issue13011

#12984: XML NamedNodeMap ( attribName in NamedNodeMap fails )
http://bugs.python.org/issue12984

#12983: byte string literals with invalid hex escape codes raise Value
http://bugs.python.org/issue12983


Most recent 15 issues waiting for review (15)
=============================================

#13032: h2py.py can fail with UnicodeDecodeError
http://bugs.python.org/issue13032

#13031: [PATCH] small speed-up for tarfile.py when unzipping tarballs
http://bugs.python.org/issue13031

#13025: mimetypes should read the rule file using UTF-8, not the local
http://bugs.python.org/issue13025

#13024: cgitb uses stdout encoding
http://bugs.python.org/issue13024

#13018: dictobject.c: refleak
http://bugs.python.org/issue13018

#13017: pyexpat.c: refleak
http://bugs.python.org/issue13017

#13016: selectmodule.c: refleak
http://bugs.python.org/issue13016

#13015: _collectionsmodule.c: refleak
http://bugs.python.org/issue13015

#13012: Allow keyword argument in str.splitlines()
http://bugs.python.org/issue13012

#13001: test_socket.testRecvmsgTrunc failure on FreeBSD 7.2 buildbot
http://bugs.python.org/issue13001

#12991: Python 64-bit build on HP Itanium - Executable built successfu
http://bugs.python.org/issue12991

#12989: Consistently handle path separator in Py_GetPath on Windows
http://bugs.python.org/issue12989

#12986: Using getrandbits() in uuid.uuid4() is faster and more readabl
http://bugs.python.org/issue12986

#12985: Check signed arithmetic overflow in ./configure
http://bugs.python.org/issue12985

#12981: rewrite multiprocessing (senfd|recvfd) in Python
http://bugs.python.org/issue12981


Top 10 most discussed issues (10)
=================================

#12981: rewrite multiprocessing (senfd|recvfd) in Python
http://bugs.python.org/issue12981  11 msgs

#12943: tokenize: add python -m tokenize support back
http://bugs.python.org/issue12943   8 msgs

#12991: Python 64-bit build on HP Itanium - Executable built successfu
http://bugs.python.org/issue12991   8 msgs

#12729: Python lib re cannot handle Unicode properly due to narrow/wid
http://bugs.python.org/issue12729   7 msgs

#12955: urllib.request example should use "with ... as:"
http://bugs.python.org/issue12955   7 msgs

#13000: unhandled exception at  install
http://bugs.python.org/issue13000   7 msgs

#13012: Allow keyword argument in str.splitlines()
http://bugs.python.org/issue13012   6 msgs

#12998: Memory leak with CTypes Structure
http://bugs.python.org/issue12998   5 msgs

#13026: Dis module - documentation of MAKE_FUNCTION
http://bugs.python.org/issue13026   5 msgs

#11816: Refactor the dis module to provide better building blocks for 
http://bugs.python.org/issue11816   4 msgs


Issues closed (31)
==================

#11037: State of PEP 382 or How does distutils2 handle namespaces?
http://bugs.python.org/issue11037  closed by eric.araujo

#11701: email.parser.BytesParser().parse() closes file argument
http://bugs.python.org/issue11701  closed by sdaoden

#11913: sdist refuses README.rst
http://bugs.python.org/issue11913  closed by eric.araujo

#11935: MMDF/MBOX mailbox need utime
http://bugs.python.org/issue11935  closed by sdaoden

#12145: distutils2 should support README.rst
http://bugs.python.org/issue12145  closed by eric.araujo

#12395: packaging remove fails under Windows
http://bugs.python.org/issue12395  closed by eric.araujo

#12678: test_packaging and test_distutils failures under Windows
http://bugs.python.org/issue12678  closed by eric.araujo

#12785: list_distinfo_file is wrong
http://bugs.python.org/issue12785  closed by eric.araujo

#12931: xmlrpclib confuses unicode and string
http://bugs.python.org/issue12931  closed by haypo

#12936: armv5tejl segfaults: sched_setaffinity() vs. pthread_setaffini
http://bugs.python.org/issue12936  closed by skrah

#12938: html.escape docstring does not mention single quotes (')
http://bugs.python.org/issue12938  closed by orsenthil

#12958: test_socket failures on Mac OS X
http://bugs.python.org/issue12958  closed by python-dev

#12960: threading.Condition is not a class
http://bugs.python.org/issue12960  closed by haypo

#12961: itertools: unlabelled balls in boxes
http://bugs.python.org/issue12961  closed by rhettinger

#12972: Color prompt + readline
http://bugs.python.org/issue12972  closed by terry.reedy

#12976: add support for MirBSD platform
http://bugs.python.org/issue12976  closed by loewis

#12977: socket.socket.setblocking does not raise exception if no data 
http://bugs.python.org/issue12977  closed by georg.brandl

#12994: cx_Oracle failed to load in newly build python 2.7.1
http://bugs.python.org/issue12994  closed by terry.reedy

#12995: Different behaviours with <class str> between v3.1.2 and v3.2.
http://bugs.python.org/issue12995  closed by benjamin.peterson

#12996: multiprocessing.Connection endianness issue
http://bugs.python.org/issue12996  closed by neologix

#13002: peephole.c: unused parameter
http://bugs.python.org/issue13002  closed by skrah

#13003: Bug in equivalent code for itertools.izip_longest
http://bugs.python.org/issue13003  closed by georg.brandl

#13005: operator module docs include repeat
http://bugs.python.org/issue13005  closed by python-dev

#13006: bug in core python variable binding
http://bugs.python.org/issue13006  closed by amaury.forgeotdarc

#13007: gdbm 1.9 has new magic that whichdb does not recognize
http://bugs.python.org/issue13007  closed by python-dev

#13010: devguide doc: ./python.exe on OS X
http://bugs.python.org/issue13010  closed by ezio.melotti

#13021: Resource is not released before returning from the function
http://bugs.python.org/issue13021  closed by barry

#13022: _multiprocessing.recvfd() doesn't check that file descriptor w
http://bugs.python.org/issue13022  closed by python-dev

#13036: time format in logging is wrong
http://bugs.python.org/issue13036  closed by vinay.sajip

#793069: Add --remove-source option
http://bugs.python.org/issue793069  closed by eric.araujo

#1172711: long long support for array module
http://bugs.python.org/issue1172711  closed by meador.inge

From merwok at netwok.org  Fri Sep 23 19:11:39 2011
From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Fri, 23 Sep 2011 19:11:39 +0200
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Issue #7732:
 Don't open a directory as a file anymore while importing a
In-Reply-To: <E1R795l-0004o0-38@dinsdale.python.org>
References: <E1R795l-0004o0-38@dinsdale.python.org>
Message-ID: <4E7CBDCB.9000506@netwok.org>

Hi Victor,

> diff --git a/Misc/NEWS b/Misc/NEWS
> --- a/Misc/NEWS
> +++ b/Misc/NEWS
> @@ -10,6 +10,10 @@
>  Core and Builtins
>  -----------------
>  
> +- Issue #7732: Don't open a directory as a file anymore while importing a
> +  module. Ignore the direcotry if its name matchs the module name (e.g.
Typo: direcotry

From ezio.melotti at gmail.com  Fri Sep 23 19:22:55 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Fri, 23 Sep 2011 20:22:55 +0300
Subject: [Python-Dev] [Python-checkins] cpython (3.2): Issue #7732:
 Don't open a directory as a file anymore while importing a
In-Reply-To: <4E7CBDCB.9000506@netwok.org>
References: <E1R795l-0004o0-38@dinsdale.python.org>
	<4E7CBDCB.9000506@netwok.org>
Message-ID: <4E7CC06F.9090301@gmail.com>

On 23/09/2011 20.11, ?ric Araujo wrote:
> Hi Victor,
>
>> diff --git a/Misc/NEWS b/Misc/NEWS
>> --- a/Misc/NEWS
>> +++ b/Misc/NEWS
>> @@ -10,6 +10,10 @@
>>   Core and Builtins
>>   -----------------
>>
>> +- Issue #7732: Don't open a directory as a file anymore while importing a
>> +  module. Ignore the direcotry if its name matchs the module name (e.g.
> Typo: direcotry
Typo: matchs

From ethan at stoneleaf.us  Fri Sep 23 20:04:50 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 23 Sep 2011 11:04:50 -0700
Subject: [Python-Dev] range objects in 3.x
Message-ID: <4E7CCA42.2060100@stoneleaf.us>

A question came up on StackOverflow about range objects and floating 
point numbers.  I thought about writing an frange that did for floats 
what range does for ints, so started examining the range class.  I 
noticed it has __le__, __lt__, __eq__, __ne__, __ge__, and __gt__ 
methods.  Some experiments show that xrange in 2.x does indeed implement 
those operations, but in 3.x range does not (TypeError: unorderable 
types: range() > range()).

Was this intentional, or should I file a bug report?  (I was unable to 
find anything in the What's New documents; also, I did not test in 3.0, 
just in 2.7, 3.1, 3.2.)

~Ethan~

From benjamin at python.org  Fri Sep 23 20:14:36 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Fri, 23 Sep 2011 14:14:36 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E7CCA42.2060100@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>
Message-ID: <CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>

2011/9/23 Ethan Furman <ethan at stoneleaf.us>:
> A question came up on StackOverflow about range objects and floating point
> numbers. ?I thought about writing an frange that did for floats what range
> does for ints, so started examining the range class. ?I noticed it has
> __le__, __lt__, __eq__, __ne__, __ge__, and __gt__ methods. ?Some
> experiments show that xrange in 2.x does indeed implement those operations,
> but in 3.x range does not (TypeError: unorderable types: range() > range()).
>
> Was this intentional, or should I file a bug report? ?(I was unable to find
> anything in the What's New documents; also, I did not test in 3.0, just in
> 2.7, 3.1, 3.2.)

That's simply a consequence of everything having comparisons defined
in 2.x. The comparison is essentially meaningless.


-- 
Regards,
Benjamin

From guido at python.org  Fri Sep 23 20:23:07 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 23 Sep 2011 11:23:07 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>
Message-ID: <CAP7+vJJ+a64FNY3oUenuU4rbd46YYoSTvg-LCeeXGXk0CLcP2w@mail.gmail.com>

Also, Ethan, I hope you're familiar with the reason why there is no
range() support for floats currently? (Briefly, things like range(0.0,
0.8, step=0.1) could include or exclude the end point depending on
rounding, which makes for troublesome semantics.)

On Fri, Sep 23, 2011 at 11:14 AM, Benjamin Peterson <benjamin at python.org> wrote:
> 2011/9/23 Ethan Furman <ethan at stoneleaf.us>:
>> A question came up on StackOverflow about range objects and floating point
>> numbers. ?I thought about writing an frange that did for floats what range
>> does for ints, so started examining the range class. ?I noticed it has
>> __le__, __lt__, __eq__, __ne__, __ge__, and __gt__ methods. ?Some
>> experiments show that xrange in 2.x does indeed implement those operations,
>> but in 3.x range does not (TypeError: unorderable types: range() > range()).
>>
>> Was this intentional, or should I file a bug report? ?(I was unable to find
>> anything in the What's New documents; also, I did not test in 3.0, just in
>> 2.7, 3.1, 3.2.)
>
> That's simply a consequence of everything having comparisons defined
> in 2.x. The comparison is essentially meaningless.
>
>
> --
> Regards,
> Benjamin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (python.org/~guido)

From ethan at stoneleaf.us  Fri Sep 23 20:25:35 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 23 Sep 2011 11:25:35 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJJ+a64FNY3oUenuU4rbd46YYoSTvg-LCeeXGXk0CLcP2w@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>
	<CAP7+vJJ+a64FNY3oUenuU4rbd46YYoSTvg-LCeeXGXk0CLcP2w@mail.gmail.com>
Message-ID: <4E7CCF1F.1070906@stoneleaf.us>

Guido van Rossum wrote:
> Also, Ethan, I hope you're familiar with the reason why there is no
> range() support for floats currently? (Briefly, things like range(0.0,
> 0.8, step=0.1) could include or exclude the end point depending on
> rounding, which makes for troublesome semantics.)

Good point, thanks for the reminder.

~Ethan~

From ethan at stoneleaf.us  Fri Sep 23 22:23:26 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 23 Sep 2011 13:23:26 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAPZV6o-JEG5zXkusES0BAV9U=PY1SsBNHbZGvdBFmMPkU_pp-A@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>	<CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>	<4E7CD070.9020507@stoneleaf.us>
	<CAPZV6o-JEG5zXkusES0BAV9U=PY1SsBNHbZGvdBFmMPkU_pp-A@mail.gmail.com>
Message-ID: <4E7CEABE.8090001@stoneleaf.us>

Benjamin Peterson wrote:
> 2011/9/23 Ethan Furman <ethan at stoneleaf.us>:
 >>
>> Follow-up question: since the original range returned lists, and comparisons
>> do make sense for lists, should the new range also implement them?
> 
> What would be the use-case?

The only reason I'm aware of at the moment is to prevent loss of 
functionality from 2.x range to 3.x range.

I'm -0 with a decision to not have range be orderable; but I understand 
there are bigger fish to fry.  :)

My original concern was that the comparison methods were there at all, 
but looking around I see object has them, so it makes sense to me now. 
I had thought I would have to implement them if I went ahead with an 
frange (for floats).

>> I note
>> that it does implement __contains__, __getitem__, count, and index in the
>> same way that list does.
> 
> That's because it implements the Sequence ABC.

So the question becomes, Why does it implement the Sequence ABC? 
Because the original range returned a list and those operations made sense?

~Ethan~

From catch-all at masklinn.net  Fri Sep 23 22:04:11 2011
From: catch-all at masklinn.net (Xavier Morel)
Date: Fri, 23 Sep 2011 22:04:11 +0200
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJJ+a64FNY3oUenuU4rbd46YYoSTvg-LCeeXGXk0CLcP2w@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>
	<CAP7+vJJ+a64FNY3oUenuU4rbd46YYoSTvg-LCeeXGXk0CLcP2w@mail.gmail.com>
Message-ID: <72911D45-C7BB-4EA2-88FF-2E7C6476A114@masklinn.net>

On 2011-09-23, at 20:23 , Guido van Rossum wrote:
> Also, Ethan, I hope you're familiar with the reason why there is no
> range() support for floats currently? (Briefly, things like range(0.0,
> 0.8, step=0.1) could include or exclude the end point depending on
> rounding, which makes for troublesome semantics.)
On the other hand, there could be a range for Decimal could there not?

From benjamin at python.org  Fri Sep 23 22:26:47 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Fri, 23 Sep 2011 16:26:47 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E7CEABE.8090001@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>
	<4E7CD070.9020507@stoneleaf.us>
	<CAPZV6o-JEG5zXkusES0BAV9U=PY1SsBNHbZGvdBFmMPkU_pp-A@mail.gmail.com>
	<4E7CEABE.8090001@stoneleaf.us>
Message-ID: <CAPZV6o-xeeg681xJc1c70KJzREVqPL=-yPg=j3F-ub6h0hZG3A@mail.gmail.com>

2011/9/23 Ethan Furman <ethan at stoneleaf.us>:
> Benjamin Peterson wrote:
>>
>> 2011/9/23 Ethan Furman <ethan at stoneleaf.us>:
>
>>>
>>>
>>> Follow-up question: since the original range returned lists, and
>>> comparisons
>>> do make sense for lists, should the new range also implement them?
>>
>> What would be the use-case?
>
> The only reason I'm aware of at the moment is to prevent loss of
> functionality from 2.x range to 3.x range.

range comparisons in 2.x have no functionality.

>
>>> I note
>>> that it does implement __contains__, __getitem__, count, and index in the
>>> same way that list does.
>>
>> That's because it implements the Sequence ABC.
>
> So the question becomes, Why does it implement the Sequence ABC? Because the
> original range returned a list and those operations made sense?

I'm not sure what the history is.


-- 
Regards,
Benjamin

From ethan at stoneleaf.us  Fri Sep 23 22:38:08 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 23 Sep 2011 13:38:08 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAPZV6o-xeeg681xJc1c70KJzREVqPL=-yPg=j3F-ub6h0hZG3A@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>	<CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>	<4E7CD070.9020507@stoneleaf.us>	<CAPZV6o-JEG5zXkusES0BAV9U=PY1SsBNHbZGvdBFmMPkU_pp-A@mail.gmail.com>	<4E7CEABE.8090001@stoneleaf.us>
	<CAPZV6o-xeeg681xJc1c70KJzREVqPL=-yPg=j3F-ub6h0hZG3A@mail.gmail.com>
Message-ID: <4E7CEE30.7090406@stoneleaf.us>

Benjamin Peterson wrote:
> 2011/9/23 Ethan Furman <ethan at stoneleaf.us>:
>> Benjamin Peterson wrote:
>>> 2011/9/23 Ethan Furman <ethan at stoneleaf.us>:
>>>>
>>>> Follow-up question: since the original range returned lists, and
>>>> comparisons
>>>> do make sense for lists, should the new range also implement them?
>>> What would be the use-case?
>> The only reason I'm aware of at the moment is to prevent loss of
>> functionality from 2.x range to 3.x range.
> 
> range comparisons in 2.x have no functionality.

Python 2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit 
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
--> r1 = range(10)
--> r2 = range(0, 20, 2)
--> r3 = range(10)
--> r1 == r3
True
--> r1 < r2
True
--> r3 > r2
False

Yes, I realize this is because range returned a list in 2.x.  However, 
aren't __contains__, __getitem__, count, and index implemented in 3.x 
range because 2.x range returned lists?

~Ethan~

From guido at python.org  Fri Sep 23 23:04:16 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 23 Sep 2011 14:04:16 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E7CEABE.8090001@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>
	<4E7CD070.9020507@stoneleaf.us>
	<CAPZV6o-JEG5zXkusES0BAV9U=PY1SsBNHbZGvdBFmMPkU_pp-A@mail.gmail.com>
	<4E7CEABE.8090001@stoneleaf.us>
Message-ID: <CAP7+vJJ_W=f=-LbYvL0-JLpcqfXvDus=ops-C8n1h2Ks8CSCng@mail.gmail.com>

On Fri, Sep 23, 2011 at 1:23 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> The only reason I'm aware of at the moment is to prevent loss of
> functionality from 2.x range to 3.x range.
>
> I'm -0 with a decision to not have range be orderable; but I understand
> there are bigger fish to fry. ?:)

I don't believe there's a valid use case for ordering ranges. As for
backwards compatibility, apparently nobody cares or we would've heard
about it.

> My original concern was that the comparison methods were there at all, but
> looking around I see object has them, so it makes sense to me now. I had
> thought I would have to implement them if I went ahead with an frange (for
> floats).

[...]> So the question becomes, Why does it implement the Sequence
ABC? Because the
> original range returned a list and those operations made sense?

Because all operations on Sequence make sense: you can iterate over a
range, it has a definite number of items, and so on; all other
sequence operations can be derived from that easily (and in fact they
almost all be done in O(1) time).

-- 
--Guido van Rossum (python.org/~guido)

From solipsis at pitrou.net  Fri Sep 23 23:06:24 2011
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 23 Sep 2011 23:06:24 +0200
Subject: [Python-Dev] range objects in 3.x
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>
	<4E7CD070.9020507@stoneleaf.us>
	<CAPZV6o-JEG5zXkusES0BAV9U=PY1SsBNHbZGvdBFmMPkU_pp-A@mail.gmail.com>
	<4E7CEABE.8090001@stoneleaf.us>
Message-ID: <20110923230624.7a5f0c03@msiwind>

Le Fri, 23 Sep 2011 13:23:26 -0700,
Ethan Furman <ethan at stoneleaf.us> a ?crit :
> 
> So the question becomes, Why does it implement the Sequence ABC? 

Because these operations are trivial to implement and it would be
suboptimal to have to instantiate the full list to run them?


From martin at v.loewis.de  Sat Sep 24 00:01:07 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 24 Sep 2011 00:01:07 +0200
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E7CEE30.7090406@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>	<CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>	<4E7CD070.9020507@stoneleaf.us>	<CAPZV6o-JEG5zXkusES0BAV9U=PY1SsBNHbZGvdBFmMPkU_pp-A@mail.gmail.com>	<4E7CEABE.8090001@stoneleaf.us>	<CAPZV6o-xeeg681xJc1c70KJzREVqPL=-yPg=j3F-ub6h0hZG3A@mail.gmail.com>
	<4E7CEE30.7090406@stoneleaf.us>
Message-ID: <4E7D01A3.3010704@v.loewis.de>

> Yes, I realize this is because range returned a list in 2.x.  However,
> aren't __contains__, __getitem__, count, and index implemented in 3.x
> range because 2.x range returned lists?

No, they are implemented because they are meaningful, and with an
obvious meaning. "Is 30 in the range from 10 to 40?" is something
that everybody will answer the same way. "What is the fifth element
of the range from 10 to 40?" may not have such a universal meaning,
but people familiar with the mathematical concept of an interval
can readily guess the answer (except that they may wonder whether
to start counting at 0 or 1).

"Is the range from 5 to 100 larger than the range from 10 to 100?"
is something that most people would answer as "yes" (I believe),
yet

py> range(5,100) > range(10,100)
False

Regards,
Martin

From ethan at stoneleaf.us  Sat Sep 24 00:24:24 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 23 Sep 2011 15:24:24 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E7D01A3.3010704@v.loewis.de>
References: <4E7CCA42.2060100@stoneleaf.us>	<CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>	<4E7CD070.9020507@stoneleaf.us>	<CAPZV6o-JEG5zXkusES0BAV9U=PY1SsBNHbZGvdBFmMPkU_pp-A@mail.gmail.com>	<4E7CEABE.8090001@stoneleaf.us>	<CAPZV6o-xeeg681xJc1c70KJzREVqPL=-yPg=j3F-ub6h0hZG3A@mail.gmail.com>
	<4E7CEE30.7090406@stoneleaf.us> <4E7D01A3.3010704@v.loewis.de>
Message-ID: <4E7D0718.1000605@stoneleaf.us>

Martin v. L?wis wrote:
>> Yes, I realize this is because range returned a list in 2.x.  However,
>> aren't __contains__, __getitem__, count, and index implemented in 3.x
>> range because 2.x range returned lists?
> 
> No, they are implemented because they are meaningful, and with an
> obvious meaning. "Is 30 in the range from 10 to 40?" is something
> that everybody will answer the same way. "What is the fifth element
> of the range from 10 to 40?" may not have such a universal meaning,
> but people familiar with the mathematical concept of an interval
> can readily guess the answer (except that they may wonder whether
> to start counting at 0 or 1).
> 
> "Is the range from 5 to 100 larger than the range from 10 to 100?"
> is something that most people would answer as "yes" (I believe),
> yet
> 
> py> range(5,100) > range(10,100)
> False


Thanks, Martin!

I can see where there could be many interpretations about the meaning of 
less-than and greater-than with regards to range.

~Ethan~

From greg.ewing at canterbury.ac.nz  Sat Sep 24 01:25:11 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 24 Sep 2011 11:25:11 +1200
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E7CEABE.8090001@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAPZV6o8yRv3XO68JBaeDiR=0NAiH05ALeNM_vhdNPhLN6gvevw@mail.gmail.com>
	<4E7CD070.9020507@stoneleaf.us>
	<CAPZV6o-JEG5zXkusES0BAV9U=PY1SsBNHbZGvdBFmMPkU_pp-A@mail.gmail.com>
	<4E7CEABE.8090001@stoneleaf.us>
Message-ID: <4E7D1557.5040107@canterbury.ac.nz>

Ethan Furman wrote:

> The only reason I'm aware of at the moment is to prevent loss of 
> functionality from 2.x range to 3.x range.

Since 2.x range(...) is equivalent to 3.x list(range(...)), I don't
see any loss of functionality there.

Comparing range objects directly in 3.x is like comparing xrange
objects in 2.x, and there the comparison was arbitrary -- it
did *not* compare them like their corresponding lists:

Python 2.7 (r27:82500, Oct 15 2010, 21:14:33)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> a = xrange(5)
 >>> b = xrange(5)
 >>> a > b
True

-- 
Greg

From techtonik at gmail.com  Sat Sep 24 01:25:52 2011
From: techtonik at gmail.com (anatoly techtonik)
Date: Sat, 24 Sep 2011 02:25:52 +0300
Subject: [Python-Dev] Inconsistent script/console behaviour
Message-ID: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>

Currently if you work in console and define a function and then
immediately call it - it will fail with SyntaxError.
For example, copy paste this completely valid Python script into console:

def some():
  print "XXX"
some()

There is an issue for that that was just closed by Eric. However, I'd
like to know if there are people here that agree that if you paste a
valid Python script into console - it should work without changes.
--
anatoly t.

From guido at python.org  Sat Sep 24 01:32:30 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 23 Sep 2011 16:32:30 -0700
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
Message-ID: <CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>

On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik <techtonik at gmail.com> wrote:
> Currently if you work in console and define a function and then
> immediately call it - it will fail with SyntaxError.
> For example, copy paste this completely valid Python script into console:
>
> def some():
> ?print "XXX"
> some()
>
> There is an issue for that that was just closed by Eric. However, I'd
> like to know if there are people here that agree that if you paste a
> valid Python script into console - it should work without changes.

You can't fix this without completely changing the way the interactive
console treats blank lines. None that it's not just that a blank line
is required after a function definition -- you also *can't* have a
blank line *inside* a function definition.

The interactive console is optimized for people entering code by
typing, not by copying and pasting large gobs of text.

If you think you can have it both, show us the code.

-- 
--Guido van Rossum (python.org/~guido)

From ubershmekel at gmail.com  Sat Sep 24 01:34:42 2011
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Fri, 23 Sep 2011 19:34:42 -0400
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
Message-ID: <CANSw7Kyx2oe5QZOFqVex71f_C1f=_5ZdUpoXe2Z7UDhgJRhZoA@mail.gmail.com>

I agree that it should and it doesn't. I also recall that not having empty
lines between function/class definitions can cause indentation errors when
pasting to the console on my windows machine.

--Yuval
On Sep 23, 2011 7:26 PM, "anatoly techtonik" <techtonik at gmail.com> wrote:
> Currently if you work in console and define a function and then
> immediately call it - it will fail with SyntaxError.
> For example, copy paste this completely valid Python script into console:
>
> def some():
> print "XXX"
> some()
>
> There is an issue for that that was just closed by Eric. However, I'd
> like to know if there are people here that agree that if you paste a
> valid Python script into console - it should work without changes.
> --
> anatoly t.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ubershmekel%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110923/c23de6ab/attachment.html>

From tjreedy at udel.edu  Sat Sep 24 01:49:53 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 23 Sep 2011 19:49:53 -0400
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
Message-ID: <j5j5v7$aae$1@dough.gmane.org>

On 9/23/2011 7:25 PM, anatoly techtonik wrote:
> Currently if you work in console and define a function and then
> immediately call it - it will fail with SyntaxError.
> For example, copy paste this completely valid Python script into console:
>
> def some():
>    print "XXX"
> some()
>
> There is an issue for that that was just closed by Eric. However, I'd
> like to know if there are people here that agree that if you paste a
> valid Python script into console - it should work without changes.

For this kind of multi-line, multi-statemenmt pasting, open an IDLE edit 
window for tem.py (my name) or such, paste, run with F5. I have found 
that this works for me than direct pasting.

A interactive lisp interpreter can detect end-of-statement without a 
blank line by matching a closing paren to the open paren that starts 
every expression.

-- 
Terry Jan Reedy


From brian.curtin at gmail.com  Sat Sep 24 02:03:29 2011
From: brian.curtin at gmail.com (Brian Curtin)
Date: Fri, 23 Sep 2011 19:03:29 -0500
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <j5j5v7$aae$1@dough.gmane.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<j5j5v7$aae$1@dough.gmane.org>
Message-ID: <CAD+XWwpdcHqh6DwqBMZdw+9N25gzhMX3x73PgrR4hzMWqexp0w@mail.gmail.com>

On Fri, Sep 23, 2011 at 18:49, Terry Reedy <tjreedy at udel.edu> wrote:
> A interactive lisp interpreter can detect end-of-statement without a blank
> line by matching a closing paren to the open paren that starts every
> expression.

Braces-loving programmers around the world are feverishly writing a
PEP as we speak.

From steve at pearwood.info  Sat Sep 24 03:36:07 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 24 Sep 2011 11:36:07 +1000
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E7CCA42.2060100@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>
Message-ID: <4E7D3407.5000207@pearwood.info>

Ethan Furman wrote:
> A question came up on StackOverflow about range objects and floating 
> point numbers.  I thought about writing an frange that did for floats 
> what range does for ints, 


For what it's worth, here's mine:

http://code.activestate.com/recipes/577068-floating-point-range/


-- 
Steven

From guido at python.org  Sat Sep 24 03:49:10 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 23 Sep 2011 18:49:10 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E7D3407.5000207@pearwood.info>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
Message-ID: <CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>

On Fri, Sep 23, 2011 at 6:36 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> Ethan Furman wrote:
>>
>> A question came up on StackOverflow about range objects and floating point
>> numbers. ?I thought about writing an frange that did for floats what range
>> does for ints,
>
>
> For what it's worth, here's mine:
>
> http://code.activestate.com/recipes/577068-floating-point-range/

I notice that your examples carefully skirt around the rounding issues.

Check out frange(0.0, 2.1, 0.7).

-- 
--Guido van Rossum (python.org/~guido)

From steve at pearwood.info  Sat Sep 24 04:13:12 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 24 Sep 2011 12:13:12 +1000
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
Message-ID: <4E7D3CB8.5050904@pearwood.info>

Guido van Rossum wrote:
> On Fri, Sep 23, 2011 at 6:36 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> Ethan Furman wrote:
>>> A question came up on StackOverflow about range objects and floating point
>>> numbers.  I thought about writing an frange that did for floats what range
>>> does for ints,
>>
>> For what it's worth, here's mine:
>>
>> http://code.activestate.com/recipes/577068-floating-point-range/
> 
> I notice that your examples carefully skirt around the rounding issues.

I also carefully *didn't* claim that it made rounding issues disappear 
completely. I'll add a note clarifying that rounding still occurs and as 
a consequence results can be unexpected.

Thanks for taking the time to comment.


-- 
Steven


From guido at python.org  Sat Sep 24 04:40:43 2011
From: guido at python.org (Guido van Rossum)
Date: Fri, 23 Sep 2011 19:40:43 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E7D3CB8.5050904@pearwood.info>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
Message-ID: <CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>

On Fri, Sep 23, 2011 at 7:13 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>>> http://code.activestate.com/recipes/577068-floating-point-range/
>>
>> I notice that your examples carefully skirt around the rounding issues.
>
> I also carefully *didn't* claim that it made rounding issues disappear
> completely. I'll add a note clarifying that rounding still occurs and as a
> consequence results can be unexpected.

I believe this API is fundamentally wrong for float ranges, even if
it's great for int ranges, and I will fight against adding it to the
stdlib in that form.

Maybe we can come up with a better API, and e.g. specify begin and end
points and the number of subdivisions? E.g. frange(0.0, 2.1, 3) would
generate [0.0, 0.7, 1.4]. Or maybe it would even be better to use
inclusive end points? OTOH if you consider extending the API to
complex numbers, it might be better to specify an initial value, a
step, and a count. So frange(0.0, 0.7, 3) to generate [0.0, 0.7, 1.4].
Probably it shouldn't be called frange then.

-- 
--Guido van Rossum (python.org/~guido)

From g.brandl at gmx.net  Sat Sep 24 08:55:19 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 24 Sep 2011 08:55:19 +0200
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
Message-ID: <j5juq0$fuq$1@dough.gmane.org>

Am 24.09.2011 04:40, schrieb Guido van Rossum:
> On Fri, Sep 23, 2011 at 7:13 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>>>> http://code.activestate.com/recipes/577068-floating-point-range/
>>>
>>> I notice that your examples carefully skirt around the rounding issues.
>>
>> I also carefully *didn't* claim that it made rounding issues disappear
>> completely. I'll add a note clarifying that rounding still occurs and as a
>> consequence results can be unexpected.
> 
> I believe this API is fundamentally wrong for float ranges, even if
> it's great for int ranges, and I will fight against adding it to the
> stdlib in that form.
> 
> Maybe we can come up with a better API, and e.g. specify begin and end
> points and the number of subdivisions? E.g. frange(0.0, 2.1, 3) would
> generate [0.0, 0.7, 1.4].

This is what numpy calls linspace:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html

numpy also has an "arange" that works with floats, but:
"""When using a non-integer step, such as 0.1, the results will often not be
consistent. It is better to use linspace for these cases."""

Georg


From g.brandl at gmx.net  Sat Sep 24 10:27:32 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 24 Sep 2011 10:27:32 +0200
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
Message-ID: <j5k46t$dc6$1@dough.gmane.org>

Am 24.09.2011 01:32, schrieb Guido van Rossum:
> On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik <techtonik at gmail.com> wrote:
>> Currently if you work in console and define a function and then
>> immediately call it - it will fail with SyntaxError.
>> For example, copy paste this completely valid Python script into console:
>>
>> def some():
>>  print "XXX"
>> some()
>>
>> There is an issue for that that was just closed by Eric. However, I'd
>> like to know if there are people here that agree that if you paste a
>> valid Python script into console - it should work without changes.
> 
> You can't fix this without completely changing the way the interactive
> console treats blank lines. None that it's not just that a blank line
> is required after a function definition -- you also *can't* have a
> blank line *inside* a function definition.

While the former could be changed (I think), the latter certainly cannot.
So it's probably not worth changing established behavior.

Georg


From ubershmekel at gmail.com  Sat Sep 24 11:53:27 2011
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Sat, 24 Sep 2011 05:53:27 -0400
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <j5k46t$dc6$1@dough.gmane.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
Message-ID: <CANSw7KxR3PhJv8nm4kHp8rcphf_NTuth1M+gNGcCBDPGaB+D_w@mail.gmail.com>

Could you elaborate on what would be wrong if function definitions ended
only after an explicitly less indented line? The only problem that comes to
mind is global scope "if" statements that wouldn't execute when expected (we
actually might need to terminate them with a dedented "pass").
On Sep 24, 2011 4:26 AM, "Georg Brandl" <g.brandl at gmx.net> wrote:
> Am 24.09.2011 01:32, schrieb Guido van Rossum:
>> On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik <techtonik at gmail.com>
wrote:
>>> Currently if you work in console and define a function and then
>>> immediately call it - it will fail with SyntaxError.
>>> For example, copy paste this completely valid Python script into
console:
>>>
>>> def some():
>>> print "XXX"
>>> some()
>>>
>>> There is an issue for that that was just closed by Eric. However, I'd
>>> like to know if there are people here that agree that if you paste a
>>> valid Python script into console - it should work without changes.
>>
>> You can't fix this without completely changing the way the interactive
>> console treats blank lines. None that it's not just that a blank line
>> is required after a function definition -- you also *can't* have a
>> blank line *inside* a function definition.
>
> While the former could be changed (I think), the latter certainly cannot.
> So it's probably not worth changing established behavior.
>
> Georg
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ubershmekel%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110924/fb5dabec/attachment.html>

From g.brandl at gmx.net  Sat Sep 24 12:05:21 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 24 Sep 2011 12:05:21 +0200
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <CANSw7KxR3PhJv8nm4kHp8rcphf_NTuth1M+gNGcCBDPGaB+D_w@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
	<CANSw7KxR3PhJv8nm4kHp8rcphf_NTuth1M+gNGcCBDPGaB+D_w@mail.gmail.com>
Message-ID: <j5k9ub$eos$1@dough.gmane.org>

You're right that in principle for function definitions there is no ambiguity.
But you also presented the downfall of that proposal: all multi-clause
statements will still need an explicit way of termination, and of course the
"pass" would be exceedingly ugly, not to mention much more confusing than the
current way.

Georg

Am 24.09.2011 11:53, schrieb Yuval Greenfield:
> Could you elaborate on what would be wrong if function definitions ended only
> after an explicitly less indented line? The only problem that comes to mind is
> global scope "if" statements that wouldn't execute when expected (we actually
> might need to terminate them with a dedented "pass").
> 
> On Sep 24, 2011 4:26 AM, "Georg Brandl" <g.brandl at gmx.net
> <mailto:g.brandl at gmx.net>> wrote:
>> Am 24.09.2011 01:32, schrieb Guido van Rossum:
>>> On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik <techtonik at gmail.com
> <mailto:techtonik at gmail.com>> wrote:
>>>> Currently if you work in console and define a function and then
>>>> immediately call it - it will fail with SyntaxError.
>>>> For example, copy paste this completely valid Python script into console:
>>>>
>>>> def some():
>>>> print "XXX"
>>>> some()
>>>>
>>>> There is an issue for that that was just closed by Eric. However, I'd
>>>> like to know if there are people here that agree that if you paste a
>>>> valid Python script into console - it should work without changes.
>>>
>>> You can't fix this without completely changing the way the interactive
>>> console treats blank lines. None that it's not just that a blank line
>>> is required after a function definition -- you also *can't* have a
>>> blank line *inside* a function definition.
>>
>> While the former could be changed (I think), the latter certainly cannot.
>> So it's probably not worth changing established behavior.


From guido at python.org  Sat Sep 24 16:59:28 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 24 Sep 2011 07:59:28 -0700
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <j5k9ub$eos$1@dough.gmane.org>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
	<j5k46t$dc6$1@dough.gmane.org>
	<CANSw7KxR3PhJv8nm4kHp8rcphf_NTuth1M+gNGcCBDPGaB+D_w@mail.gmail.com>
	<j5k9ub$eos$1@dough.gmane.org>
Message-ID: <CAP7+vJJXUC8mzgPVsFWXRn=j003vm=ufWwtO7+pe0jod5MUS+A@mail.gmail.com>

I see a lot of flawed "proposals". This is clearly a python-ideas
discussion. (Anatoly, take note -- please post your new gripe there.)

In the mean time, there's a reasonable work-around if you have to
copy/paste a large block of formatted code:

>>> exec('''
   .
   .
   .
<put anything you like here>
   .
   .
   .
''')
>>>

The only thing that you can't put in there is a triple-quoted string
using single quotes.

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Sat Sep 24 17:13:11 2011
From: guido at python.org (Guido van Rossum)
Date: Sat, 24 Sep 2011 08:13:11 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <j5juq0$fuq$1@dough.gmane.org>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
Message-ID: <CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>

On Fri, Sep 23, 2011 at 11:55 PM, Georg Brandl <g.brandl at gmx.net> wrote:
> Am 24.09.2011 04:40, schrieb Guido van Rossum:
>> On Fri, Sep 23, 2011 at 7:13 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>>>>> http://code.activestate.com/recipes/577068-floating-point-range/
>>>>
>>>> I notice that your examples carefully skirt around the rounding issues.
>>>
>>> I also carefully *didn't* claim that it made rounding issues disappear
>>> completely. I'll add a note clarifying that rounding still occurs and as a
>>> consequence results can be unexpected.
>>
>> I believe this API is fundamentally wrong for float ranges, even if
>> it's great for int ranges, and I will fight against adding it to the
>> stdlib in that form.
>>
>> Maybe we can come up with a better API, and e.g. specify begin and end
>> points and the number of subdivisions? E.g. frange(0.0, 2.1, 3) would
>> generate [0.0, 0.7, 1.4].
>
> This is what numpy calls linspace:
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html
>
> numpy also has an "arange" that works with floats, but:
> """When using a non-integer step, such as 0.1, the results will often not be
> consistent. It is better to use linspace for these cases."""

Aha, I like linspace(). I started a G+ thread
(https://plus.google.com/u/0/115212051037621986145/posts/ZnrWDiHHiaW)
but it mostly served to demonstrate that few people understand
floating point, and that those that do don't understand how hard it is
for the others. Jeffrey Yaskin's analysis (starting with "To anyone
who thinks they can recover inside frange():") is the best of the
bunch. But I still believe that it's best *not* to have frange(), and
to warn about the flaws in the existing implementations floating
around (like Steven's), referring them to linspace() instead.

It looks easy enough to implement a basic linspace() that doesn't have
the problems of frange(), and having a recipe handy (for those who
don't want or need NumPy) would be a great start.

I expect that to implement a version worthy of the stdlib math module,
i.e. that computes values that are correct within 0.5ULP under all
circumstances (e.g. lots of steps, or an end point close to the end of
the floating point range) we'd need a numerical wizard like Mark
Dickinson or Tim Peters (retired). Or maybe we could just borrow
numpy's code.

-- 
--Guido van Rossum (python.org/~guido)

From tjreedy at udel.edu  Sat Sep 24 23:12:30 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 24 Sep 2011 17:12:30 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
Message-ID: <j5lh46$q4s$1@dough.gmane.org>

On 9/23/2011 10:40 PM, Guido van Rossum wrote:
> On Fri, Sep 23, 2011 at 7:13 PM, Steven D'Aprano<steve at pearwood.info>  wrote:

>> I also carefully *didn't* claim that it made rounding issues disappear
>> completely. I'll add a note clarifying that rounding still occurs and as a
>> consequence results can be unexpected.

To avoid inclusion/exclusion errors, you should be testing values 
against a stop value that is (except for rounding errors) half a step 
above the last value you want to yield. In other words, subtract or add 
step/2.0 to the stop value according to whether or not you want it 
excluded or included.

> I believe this API is fundamentally wrong for float ranges, even if
> it's great for int ranges, and I will fight against adding it to the
> stdlib in that form.

I completely agree. For range(n), n is both the stop value and number of 
ints generated. It is otherwise stop-start, which is to say, stop = 
start + n, which is why there is no need for an n-based api (all this is 
by design).

> Maybe we can come up with a better API, and e.g. specify begin and end
> points and the number of subdivisions? E.g. frange(0.0, 2.1, 3) would
> generate [0.0, 0.7, 1.4]. Or maybe it would even be better to use
> inclusive end points? OTOH if you consider extending the API to
> complex numbers, it might be better to specify an initial value, a
> step, and a count. So frange(0.0, 0.7, 3) to generate [0.0, 0.7, 1.4].
> Probably it shouldn't be called frange then.

In float use cases I can think of, one wants either both or neither end 
point. If neither, one probably wants points at .5*step, 1.5*step, etc., 
where step calculated as (right-left)/n.

-- 
Terry Jan Reedy


From steve at pearwood.info  Sun Sep 25 07:21:06 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 25 Sep 2011 15:21:06 +1000
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
Message-ID: <4E7EBA42.4060707@pearwood.info>

Guido van Rossum wrote:
> On Fri, Sep 23, 2011 at 7:13 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>>>> http://code.activestate.com/recipes/577068-floating-point-range/
>>> I notice that your examples carefully skirt around the rounding issues.
>> I also carefully *didn't* claim that it made rounding issues disappear
>> completely. I'll add a note clarifying that rounding still occurs and as a
>> consequence results can be unexpected.
> 
> I believe this API is fundamentally wrong for float ranges, even if
> it's great for int ranges, and I will fight against adding it to the
> stdlib in that form.

I wasn't proposing it to be in the standard lib, it was just an idle 
comment triggered by the OP's question. But I'm gratified it has started 
an interesting discussion.

Whether the most float-friendly or not, the start/stop/step API is the 
most obvious and user-friendly for at least one use-case: graphing of 
functions.

It is natural to say something like "draw a graph starting at 0, 
sampling every 0.1 unit, and stop when you get past 3". My HP-48 
graphing calculator does exactly that: you must specify the start and 
stop coordinates, and an optional step size. By default, the step size 
is calculated for you assuming you want one point plotted per pixel. 
Given that the calculator display is both low-resolution and fixed size, 
that makes sense as the default, but you can set the step size manually 
if desired.

start/stop/step is also familiar for users of Excel and other 
spreadsheets' Fill>Series command.

Numeric integration is an interesting case, because generally you want 
multiple iterations, interpolating between the points previously seen 
until you reach some desired level of accuracy. E.g.:

#1:  0.0, 0.5, 1.0
#2:  0.25, 0.75
#3:  0.125, 0.375, 0.625, 0.875

For integration, I would probably want both APIs.


> Maybe we can come up with a better API, and e.g. specify begin and end
> points and the number of subdivisions? 

Thanks to Mark Dickinson for suggesting using Fraction, I have this:

http://code.activestate.com/recipes/577878-generate-equally-spaced-floats/


-- 
Steven

From guido at python.org  Sun Sep 25 16:38:55 2011
From: guido at python.org (Guido van Rossum)
Date: Sun, 25 Sep 2011 07:38:55 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E7EBA42.4060707@pearwood.info>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<4E7EBA42.4060707@pearwood.info>
Message-ID: <CAP7+vJLZnbbSijKS_kTphPkE_0EoJcW2VZ5XOmXL2TBchCqp-Q@mail.gmail.com>

On Sat, Sep 24, 2011 at 10:21 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> Guido van Rossum wrote:
>> I believe this API is fundamentally wrong for float ranges, even if
>> it's great for int ranges, and I will fight against adding it to the
>> stdlib in that form.
>
> I wasn't proposing it to be in the standard lib, it was just an idle comment
> triggered by the OP's question. But I'm gratified it has started an
> interesting discussion.
>
> Whether the most float-friendly or not, the start/stop/step API is the most
> obvious and user-friendly for at least one use-case: graphing of functions.

It *appears* that. But the flaws make for hard-to-debug edge cases
(when an extra point unexpectedly appears). I've debugged a few bits
of charting code, and there are enough other causes for confusing
output that we don't need this problem.

> It is natural to say something like "draw a graph starting at 0, sampling
> every 0.1 unit, and stop when you get past 3". My HP-48 graphing calculator
> does exactly that: you must specify the start and stop coordinates, and an
> optional step size. By default, the step size is calculated for you assuming
> you want one point plotted per pixel. Given that the calculator display is
> both low-resolution and fixed size, that makes sense as the default, but you
> can set the step size manually if desired.

Yeah, but the HP uses decimal internally.

It's just as easy for the user to specify the number of steps, and it
has the advantage of not having the edge case problems. And you know
how many points you'll get.

> start/stop/step is also familiar for users of Excel and other spreadsheets'
> Fill>Series command.

Not sure I want to follow Excel's example for *anything*.

> Numeric integration is an interesting case, because generally you want
> multiple iterations, interpolating between the points previously seen until
> you reach some desired level of accuracy. E.g.:
>
> #1: ?0.0, 0.5, 1.0
> #2: ?0.25, 0.75
> #3: ?0.125, 0.375, 0.625, 0.875

So double the number of steps each time. Seems simpler to me
(manipulating ints instead of floats).

> For integration, I would probably want both APIs.
>
>
>> Maybe we can come up with a better API, and e.g. specify begin and end
>> points and the number of subdivisions?
>
> Thanks to Mark Dickinson for suggesting using Fraction, I have this:
>
> http://code.activestate.com/recipes/577878-generate-equally-spaced-floats/

Nice one!

-- 
--Guido van Rossum (python.org/~guido)

From ncoghlan at gmail.com  Mon Sep 26 05:47:40 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 26 Sep 2011 13:47:40 +1000
Subject: [Python-Dev] [Python-checkins] cpython: Issue #12981: rewrite
 multiprocessing_{sendfd, recvfd} in Python.
In-Reply-To: <E1R7Wa1-00037I-Lc@dinsdale.python.org>
References: <E1R7Wa1-00037I-Lc@dinsdale.python.org>
Message-ID: <CADiSq7f+xgWUkU6_9yKsOSaGeeJEE7qE0TqkLPoPqKbV=5EsZw@mail.gmail.com>

On Sun, Sep 25, 2011 at 4:04 AM, charles-francois.natali
<python-checkins at python.org> wrote:
> +if not(sys.platform == 'win32' or (hasattr(socket, 'CMSG_LEN') and
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? hasattr(socket, 'SCM_RIGHTS'))):
> ? ? raise ImportError('pickling of connections not supported')

I'm pretty sure the functionality checks for CMSG_LEN and SCM_RIGHTS
mean the platform check for Windows is now redundant.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From neologix at free.fr  Mon Sep 26 08:48:06 2011
From: neologix at free.fr (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Mon, 26 Sep 2011 08:48:06 +0200
Subject: [Python-Dev] [Python-checkins] cpython: Issue #12981: rewrite
 multiprocessing_{sendfd, recvfd} in Python.
In-Reply-To: <CADiSq7f+xgWUkU6_9yKsOSaGeeJEE7qE0TqkLPoPqKbV=5EsZw@mail.gmail.com>
References: <E1R7Wa1-00037I-Lc@dinsdale.python.org>
	<CADiSq7f+xgWUkU6_9yKsOSaGeeJEE7qE0TqkLPoPqKbV=5EsZw@mail.gmail.com>
Message-ID: <CAH_1eM3hfRCFLWoDx1HT+Kg+XpR_rtGmJRC-oSDoMeRGHfec_g@mail.gmail.com>

> On Sun, Sep 25, 2011 at 4:04 AM, charles-francois.natali
> <python-checkins at python.org> wrote:
>> +if not(sys.platform == 'win32' or (hasattr(socket, 'CMSG_LEN') and
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? hasattr(socket, 'SCM_RIGHTS'))):
>> ? ? raise ImportError('pickling of connections not supported')
>
> I'm pretty sure the functionality checks for CMSG_LEN and SCM_RIGHTS
> mean the platform check for Windows is now redundant.
>

I'm not sure I understand what you mean.
FD passing is supported on Unix with sendmsg/SCM_RIGHTS, and on
Windows using whatever Windows uses for that purpose (see
http://hg.python.org/cpython/file/2b47f0146639/Lib/multiprocessing/reduction.py#l63).
If we remove the check for Windows, an ImportError will be raised
systematically, unless you suggest that Windows does support
sendmsg/SCM_RIGHTS (I somehow doubt Windows supports Unix domain
sockets, but I don't know Windows at all).

cf

From ncoghlan at gmail.com  Mon Sep 26 15:21:08 2011
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 26 Sep 2011 09:21:08 -0400
Subject: [Python-Dev] [Python-checkins] cpython: Issue #12981: rewrite
 multiprocessing_{sendfd, recvfd} in Python.
In-Reply-To: <CAH_1eM3hfRCFLWoDx1HT+Kg+XpR_rtGmJRC-oSDoMeRGHfec_g@mail.gmail.com>
References: <E1R7Wa1-00037I-Lc@dinsdale.python.org>
	<CADiSq7f+xgWUkU6_9yKsOSaGeeJEE7qE0TqkLPoPqKbV=5EsZw@mail.gmail.com>
	<CAH_1eM3hfRCFLWoDx1HT+Kg+XpR_rtGmJRC-oSDoMeRGHfec_g@mail.gmail.com>
Message-ID: <CADiSq7e53kcggxjtA+SLam_MchA0Ptn93qNycujd1Z3-Rb2m6A@mail.gmail.com>

2011/9/26 Charles-Fran?ois Natali <neologix at free.fr>:
> I'm not sure I understand what you mean.

You actually understood what I meant, I was just wrong because I
misread the conditional. Nothing to see here, please move along :)

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia

From guido at python.org  Mon Sep 26 23:00:06 2011
From: guido at python.org (Guido van Rossum)
Date: Mon, 26 Sep 2011 14:00:06 -0700
Subject: [Python-Dev] PEP 393 close to pronouncement
Message-ID: <CAP7+vJLDK10VfQ+KRXDAMrNaRV+-1W76SdBGQNd9GHAbiuPOwA@mail.gmail.com>

Martin has asked me to pronounce on PEP 393, after he's updated it in
response to various feedback (including mine :-). I'm currently
looking very favorable on it, but I thought I'd give folks here one
more chance to bring up showstoppers.

So, if you have the time, please review PEP 393 and/or play with the
code (the repo is linked from the PEP's References section now).

Please limit your feedback to show-stopping issues; we're past the
stage of bikeshedding here. It's Good Enough (TM) and we'll have to
rest of the 3.3 release cycle to improve incrementally. But we need to
get to the point where the code can be committed to the 3.3 branch.

In a few days I'll pronounce.

-- 
--Guido van Rossum (python.org/~guido)

From fperez.net at gmail.com  Mon Sep 26 23:06:44 2011
From: fperez.net at gmail.com (Fernando Perez)
Date: Mon, 26 Sep 2011 21:06:44 +0000 (UTC)
Subject: [Python-Dev] range objects in 3.x
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
Message-ID: <j5qph4$p0l$1@dough.gmane.org>

On Sat, 24 Sep 2011 08:13:11 -0700, Guido van Rossum wrote:

> I expect that to implement a version worthy of the stdlib math module,
> i.e. that computes values that are correct within 0.5ULP under all
> circumstances (e.g. lots of steps, or an end point close to the end of
> the floating point range) we'd need a numerical wizard like Mark
> Dickinson or Tim Peters (retired). Or maybe we could just borrow numpy's
> code.

+1 to using the numpy api, having continuity of API between the two would 
be great (people work interactively with 'from numpy import *', so having 
the linspace() call continue to work identically would be a bonus).

License-wise there shouldn't be major issues in using the numpy code, as 
numpy is all BSD.  Hopefully if there are any, the numpy community can 
help out.  And now that Mark Dickinson is at Enthought (http://
enthought.com/company/developers.php) where Travis Oliphant --numpy 
author-- works, I'm sure the process of ironing out any implementation/api 
quirks could be handled easily.

Cheers,

f


From victor.stinner at haypocalc.com  Tue Sep 27 00:19:02 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 27 Sep 2011 00:19:02 +0200
Subject: [Python-Dev] PEP 393 close to pronouncement
In-Reply-To: <CAP7+vJLDK10VfQ+KRXDAMrNaRV+-1W76SdBGQNd9GHAbiuPOwA@mail.gmail.com>
References: <CAP7+vJLDK10VfQ+KRXDAMrNaRV+-1W76SdBGQNd9GHAbiuPOwA@mail.gmail.com>
Message-ID: <201109270019.02442.victor.stinner@haypocalc.com>

Hi,

Le lundi 26 septembre 2011 23:00:06, Guido van Rossum a ?crit :
> So, if you have the time, please review PEP 393 and/or play with the
> code (the repo is linked from the PEP's References section now).

I played with the code. The full test suite pass on Linux, FreeBSD and 
Windows. On Windows, there is just one failure in test_configparser, I didn't 
investigate it yet. I like the new API: a classic loop on the string length, 
and a macro to read the nth character. The backward compatibility is fully 
transparent and is already well tested because some modules still use the 
legacy API.

It's quite easy to move from the legacy API to the new API. It's just boring, 
but it's almost done in the core (unicodeobject.c, but also some modules like 
_io).

Since the introduction of PyASCIIObject, the PEP 393 is really good in memory 
footprint, especially for ASCII-only strings. In Python, you manipulate a lot 
of ASCII strings.


PEP
===

It's not clear what is deprecated. It would help to have a full list of the 
deprecated functions/macros.

Sometimes Martin wrote PyUnicode_Ready, sometimes PyUnicode_READY. It's 
confusing.

Typo: PyUnicode_FAST_READY => PyUnicode_READY.

"PyUnicode_WRITE_CHAR" is not listed in the New API section.

Typo in "PyUnicode_CONVERT_BYTES(from_type, tp_type, begin, end, to)": tp_type 
=> to_type.

"PyUnicode_Chr(ch)": Why introducing a new function? PyUnicode_FromOrdinal was 
not enough?

"GDB Debugging Hooks" It's not done yet.

"None of the functions in this PEP become part of the stable ABI (PEP 384)." 
Why? Some functions don't depend on the internal representation, like 
PyUnicode_Substring or PyUnicode_FindChar.

Typo: "In order to port modules to the new API, try to eliminate the use of 
these API elements: ... PyUnicode_GET_LENGTH ..." PyUnicode_GET_LENGTH is part 
of the new API. I suppose that you mean PyUnicode_GET_SIZE.

Victor

From dmalcolm at redhat.com  Tue Sep 27 02:03:49 2011
From: dmalcolm at redhat.com (David Malcolm)
Date: Mon, 26 Sep 2011 20:03:49 -0400
Subject: [Python-Dev] PEP 393 close to pronouncement
In-Reply-To: <201109270019.02442.victor.stinner@haypocalc.com>
References: <CAP7+vJLDK10VfQ+KRXDAMrNaRV+-1W76SdBGQNd9GHAbiuPOwA@mail.gmail.com>
	<201109270019.02442.victor.stinner@haypocalc.com>
Message-ID: <1317081830.23847.6.camel@surprise>

On Tue, 2011-09-27 at 00:19 +0200, Victor Stinner wrote:
> Hi,
> 
> Le lundi 26 septembre 2011 23:00:06, Guido van Rossum a ?crit :
> > So, if you have the time, please review PEP 393 and/or play with the
> > code (the repo is linked from the PEP's References section now).

> 
> PEP
> ===

> "GDB Debugging Hooks" It's not done yet.
I can do these if need be, but IIRC you (Victor) said on #python-dev
that you were already working on them.


From steve at pearwood.info  Tue Sep 27 03:25:48 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 27 Sep 2011 11:25:48 +1000
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <j5qph4$p0l$1@dough.gmane.org>
References: <4E7CCA42.2060100@stoneleaf.us>
	<4E7D3407.5000207@pearwood.info>	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>	<4E7D3CB8.5050904@pearwood.info>	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>	<j5juq0$fuq$1@dough.gmane.org>	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org>
Message-ID: <4E81261C.6040200@pearwood.info>

Fernando Perez wrote:
> On Sat, 24 Sep 2011 08:13:11 -0700, Guido van Rossum wrote:
> 
>> I expect that to implement a version worthy of the stdlib math module,
>> i.e. that computes values that are correct within 0.5ULP under all
>> circumstances (e.g. lots of steps, or an end point close to the end of
>> the floating point range) we'd need a numerical wizard like Mark
>> Dickinson or Tim Peters (retired). Or maybe we could just borrow numpy's
>> code.
> 
> +1 to using the numpy api, having continuity of API between the two would 
> be great (people work interactively with 'from numpy import *', so having 
> the linspace() call continue to work identically would be a bonus).


The audience for numpy is a small minority of Python users, and they 
tend to be more sophisticated. I'm sure they can cope with two functions 
with different APIs <wink>

While continuity of API might be a good thing, we shouldn't accept a 
poor API just for the sake of continuity. I have some criticisms of the 
linspace API.

numpy.linspace(start, stop, num=50, endpoint=True, retstep=False)

http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html

* It returns a sequence, which is appropriate for numpy but in standard 
Python it should return an iterator or something like a range object.

* Why does num have a default of 50? That seems to be an arbitrary choice.

* It arbitrarily singles out the end point for special treatment. When 
integrating, it is just as common for the first point to be singular as 
the end point, and therefore needing to be excluded.

* If you exclude the end point, the stepsize, and hence the values 
returned, change:

 >>> linspace(1, 2, 4)
array([ 1.        ,  1.33333333,  1.66666667,  2.        ])
 >>> linspace(1, 2, 4, endpoint=False)
array([ 1.  ,  1.25,  1.5 ,  1.75])

This surprises me. I expect that excluding the end point will just 
exclude the end point, i.e. return one fewer point. That is, I expect 
num to count the number of subdivisions, not the number of points.

* The retstep argument changes the return signature from => array to => 
(array, number). I think that's a pretty ugly thing to do. If linspace 
returned a special iterator object, the step size could be exposed as an 
attribute.

* I'm not sure that start/end/count is a better API than start/step/count.

* This one is pure bike-shedding: I don't like the name linspace.


We've gone 20 years without a floating point range in Python. I think we 
should give people a bit of time to pay around with alternative APIs 
rather than just grab the first one that comes along.


-- 
Steven

From guido at python.org  Tue Sep 27 03:44:07 2011
From: guido at python.org (Guido van Rossum)
Date: Mon, 26 Sep 2011 18:44:07 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E81261C.6040200@pearwood.info>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
Message-ID: <CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>

On Mon, Sep 26, 2011 at 6:25 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> While continuity of API might be a good thing, we shouldn't accept a poor
> API just for the sake of continuity. I have some criticisms of the linspace
> API.
[...]

> * I'm not sure that start/end/count is a better API than start/step/count.

On this particular one, I think start/end/count *is* better, because
in the most common use case the start and end points are given, and
the step is somewhat of an afterthought (e.g. how many integration
steps, or how many points in the chart). I also keep thinking that
numerically, if start and end are given exactly, we should be able to
compute the intermediate points within 0.5ULP, whereas it would seem
that given start and step our computation for end may be considerably
off, if the count is high. Or, maybe what I'm trying to say is, if the
user has start/end/count but the API wants start/step/count, after
computing step = (end-start) / count, the value of start + count*step
might not quite equal to end; whereas if the user has start/step/count
but the API wants start/end/count I think there's nothing wrong with
computing end = start + step*count.

-- 
--Guido van Rossum (python.org/~guido)

From greg.ewing at canterbury.ac.nz  Tue Sep 27 08:23:41 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 27 Sep 2011 19:23:41 +1300
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
Message-ID: <4E816BED.4000103@canterbury.ac.nz>

Guido van Rossum wrote:
> Or, maybe what I'm trying to say is, if the
> user has start/end/count but the API wants start/step/count, after
> computing step = (end-start) / count, the value of start + count*step
> might not quite equal to end; whereas if the user has start/step/count
> but the API wants start/end/count I think there's nothing wrong with
> computing end = start + step*count.

+1, that makes sense to me.

And I don't like "linspace" either. Something more self
explanatory such as "subdivide" or "interpolate" might
be better.

-- 
Greg


From martin at v.loewis.de  Tue Sep 27 08:40:16 2011
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 27 Sep 2011 08:40:16 +0200
Subject: [Python-Dev] PEP 393 close to pronouncement
In-Reply-To: <1317081830.23847.6.camel@surprise>
References: <CAP7+vJLDK10VfQ+KRXDAMrNaRV+-1W76SdBGQNd9GHAbiuPOwA@mail.gmail.com>	<201109270019.02442.victor.stinner@haypocalc.com>
	<1317081830.23847.6.camel@surprise>
Message-ID: <4E816FD0.7040309@v.loewis.de>

>> "GDB Debugging Hooks" It's not done yet.
> I can do these if need be, but IIRC you (Victor) said on #python-dev
> that you were already working on them.

I already changed it for an earlier version of the PEP. It still needs
to sort out the various compact representations. I could do them as
well, so don't worry.

Regards,
Martin

From victor.stinner at haypocalc.com  Tue Sep 27 15:50:27 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Tue, 27 Sep 2011 15:50:27 +0200
Subject: [Python-Dev] PEP 393 close to pronouncement
In-Reply-To: <201109270019.02442.victor.stinner@haypocalc.com>
References: <CAP7+vJLDK10VfQ+KRXDAMrNaRV+-1W76SdBGQNd9GHAbiuPOwA@mail.gmail.com>
	<201109270019.02442.victor.stinner@haypocalc.com>
Message-ID: <201109271550.27837.victor.stinner@haypocalc.com>

Le mardi 27 septembre 2011 00:19:02, Victor Stinner a ?crit :
> On Windows, there is just one failure in test_configparser, I
> didn't investigate it yet

Oh, it was a real bug in io.IncrementalNewlineDecoder. It is now fixed.

Victor

From alexander.belopolsky at gmail.com  Tue Sep 27 16:52:55 2011
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Tue, 27 Sep 2011 10:52:55 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E81261C.6040200@pearwood.info>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
Message-ID: <CAP7h-xYQqRwSOS0fYVP3fsf7CqEEA9g=bpvRn4--rhR=eq_1yw@mail.gmail.com>

On Mon, Sep 26, 2011 at 9:25 PM, Steven D'Aprano <steve at pearwood.info> wrote:
..
> The audience for numpy is a small minority of Python users, and they tend to
> be more sophisticated. I'm sure they can cope with two functions with
> different APIs <wink>
>
> While continuity of API might be a good thing, we shouldn't accept a poor
> API just for the sake of continuity. I have some criticisms of the linspace
> API.

+1

In addition to Steven's criticisms of numpy.linspace(), I would like a
new function to work with types other than float.  It certainly makes
sense to have range-like functionality for fractions and decimal
floats, but also I often find a need to generate a list of equally
spaces dates or datetime points.  It would be nice if a new function
would allow start and stop to be any type that supports subtraction
and whose differences support division by numbers.

Also, in terms of implementation, I don't think we'll gain anything by
copying numpy code because linspace(start, stop, num) is effectively
just

arange(0, num) * step + start

where step is (stop-start)/(num-1).  This works because numpy arrays
(produced by arange()) support linear algebra and we are not going to
copy that.

From alexander.belopolsky at gmail.com  Tue Sep 27 17:05:15 2011
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Tue, 27 Sep 2011 11:05:15 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E816BED.4000103@canterbury.ac.nz>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
Message-ID: <CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>

On Tue, Sep 27, 2011 at 2:23 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
..
> And I don't like "linspace" either. Something more self
> explanatory such as "subdivide" or "interpolate" might
> be better.

"Grid" would be nice and short, but may suggest 2-dimentional result.
 Whatever word we choose, I think it should be a noun rather than a
verb.  ("Comb" (noun) brings up the right image, but is probably too
informal  and may be confused with a short for "combination.")

From ethan at stoneleaf.us  Tue Sep 27 17:24:16 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 27 Sep 2011 08:24:16 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<4E7D3407.5000207@pearwood.info>	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>	<4E7D3CB8.5050904@pearwood.info>	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>	<j5juq0$fuq$1@dough.gmane.org>	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>	<j5qph4$p0l$1@dough.gmane.org>
	<4E81261C.6040200@pearwood.info>	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
Message-ID: <4E81EAA0.5080507@stoneleaf.us>

Alexander Belopolsky wrote:
> On Tue, Sep 27, 2011 at 2:23 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> ..
>> And I don't like "linspace" either. Something more self
>> explanatory such as "subdivide" or "interpolate" might
>> be better.
> 
> "Grid" would be nice and short, but may suggest 2-dimentional result.
>  Whatever word we choose, I think it should be a noun rather than a
> verb.  ("Comb" (noun) brings up the right image, but is probably too
> informal  and may be confused with a short for "combination.")

segment?  srange?

~Ethan~

From raymond.hettinger at gmail.com  Tue Sep 27 17:44:56 2011
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Tue, 27 Sep 2011 11:44:56 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E81EAA0.5080507@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>
	<4E7D3407.5000207@pearwood.info>	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>	<4E7D3CB8.5050904@pearwood.info>	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>	<j5juq0$fuq$1@dough.gmane.org>	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>	<j5qph4$p0l$1@dough.gmane.org>
	<4E81261C.6040200@pearwood.info>	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
Message-ID: <A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>


On Sep 27, 2011, at 11:24 AM, Ethan Furman wrote:

> Alexander Belopolsky wrote:
>> On Tue, Sep 27, 2011 at 2:23 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>> ..
>>> And I don't like "linspace" either. Something more self
>>> explanatory such as "subdivide" or "interpolate" might
>>> be better.
>> "Grid" would be nice and short, but may suggest 2-dimentional result.
>> Whatever word we choose, I think it should be a noun rather than a
>> verb.  ("Comb" (noun) brings up the right image, but is probably too
>> informal  and may be confused with a short for "combination.")
> 
> segment?  srange?

In the math module, we used an f prefix to differentiate math.fsum() from the built-in sum() function.  That suggests frange() as a possible name for a variant of range() that creates floats.

That works reasonably well if the default argument pattern is the same as range:   frange(10.0, 20.0, 0.5)

There could be an optional argument to compute the interval:   frange(10.0, 20.0, numpoints=20)

And possibly a option to include both endpoints:  frange(10.0, 20.0, 0.5, inclusive=True)    


Raymond

From steve at pearwood.info  Tue Sep 27 18:00:15 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 28 Sep 2011 02:00:15 +1000
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<4E7D3407.5000207@pearwood.info>	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>	<4E7D3CB8.5050904@pearwood.info>	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>	<j5juq0$fuq$1@dough.gmane.org>	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>	<j5qph4$p0l$1@dough.gmane.org>
	<4E81261C.6040200@pearwood.info>	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
Message-ID: <4E81F30F.6060704@pearwood.info>

Alexander Belopolsky wrote:
> On Tue, Sep 27, 2011 at 2:23 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> ..
>> And I don't like "linspace" either. Something more self
>> explanatory such as "subdivide" or "interpolate" might
>> be better.
> 
> "Grid" would be nice and short, but may suggest 2-dimentional result.
>  Whatever word we choose, I think it should be a noun rather than a
> verb.  ("Comb" (noun) brings up the right image, but is probably too
> informal  and may be confused with a short for "combination.")

I came up with "spread".

Here's my second attempt, which offers both count/start/end and 
count/start/step APIs:

http://code.activestate.com/recipes/577881-equally-spaced-floats-part-2/


-- 
Steven


From ethan at stoneleaf.us  Tue Sep 27 18:06:55 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 27 Sep 2011 09:06:55 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<4E7D3407.5000207@pearwood.info>	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>	<4E7D3CB8.5050904@pearwood.info>	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>	<j5juq0$fuq$1@dough.gmane.org>	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>	<j5qph4$p0l$1@dough.gmane.org>
	<4E81261C.6040200@pearwood.info>	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
Message-ID: <4E81F49F.9080309@stoneleaf.us>

Raymond Hettinger wrote:
> On Sep 27, 2011, at 11:24 AM, Ethan Furman wrote:
> 
>> Alexander Belopolsky wrote:
>>> On Tue, Sep 27, 2011 at 2:23 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>>> ..
>>>> And I don't like "linspace" either. Something more self
>>>> explanatory such as "subdivide" or "interpolate" might
>>>> be better.
>>> "Grid" would be nice and short, but may suggest 2-dimentional result.
>>> Whatever word we choose, I think it should be a noun rather than a
>>> verb.  ("Comb" (noun) brings up the right image, but is probably too
>>> informal  and may be confused with a short for "combination.")
>> segment?  srange?
> 
> In the math module, we used an f prefix to differentiate math.fsum() from the built-in sum() function.  That suggests frange() as a possible name for a variant of range() that creates floats.
> 
> That works reasonably well if the default argument pattern is the same as range:   frange(10.0, 20.0, 0.5)
> 
> There could be an optional argument to compute the interval:   frange(10.0, 20.0, numpoints=20)
> 
> And possibly a option to include both endpoints:  frange(10.0, 20.0, 0.5, inclusive=True)    

I like the numpoints option.

I also like Alexander's idea of making this new range able to work with 
other types that support addition/division -- but in that case does the 
'f' prefix still make sense?

~Ethan~

From alexander.belopolsky at gmail.com  Tue Sep 27 18:16:17 2011
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Tue, 27 Sep 2011 12:16:17 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
Message-ID: <CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>

On Tue, Sep 27, 2011 at 11:44 AM, Raymond Hettinger
<raymond.hettinger at gmail.com> wrote:
..
> In the math module, we used an f prefix to differentiate math.fsum() from the built-in sum() function. ?That suggests frange() as a possible name for a variant of range() that creates floats.
>
> That works reasonably well if the default argument pattern is the same as range: ? frange(10.0, 20.0, 0.5)

+1 on adding frange() to math module or to the recently contemplated
stats module.   For something that aspires to becoming a builtin one
day, I would like to see something not focused on floats exclusively
and something with a proper English name.

From steve at pearwood.info  Tue Sep 27 18:20:27 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 28 Sep 2011 02:20:27 +1000
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7h-xYQqRwSOS0fYVP3fsf7CqEEA9g=bpvRn4--rhR=eq_1yw@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>	<4E7D3407.5000207@pearwood.info>	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>	<4E7D3CB8.5050904@pearwood.info>	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>	<j5juq0$fuq$1@dough.gmane.org>	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>	<j5qph4$p0l$1@dough.gmane.org>	<4E81261C.6040200@pearwood.info>
	<CAP7h-xYQqRwSOS0fYVP3fsf7CqEEA9g=bpvRn4--rhR=eq_1yw@mail.gmail.com>
Message-ID: <4E81F7CB.7060001@pearwood.info>

Alexander Belopolsky wrote:

> In addition to Steven's criticisms of numpy.linspace(), I would like a
> new function to work with types other than float.  It certainly makes
> sense to have range-like functionality for fractions and decimal
> floats, but also I often find a need to generate a list of equally
> spaces dates or datetime points.  It would be nice if a new function
> would allow start and stop to be any type that supports subtraction
> and whose differences support division by numbers.

I think a polymorphic numeric range function would be useful. If it 
happened to support dates, that would be great, but I think that a 
daterange() function in the datetime module would be more appropriate. 
Who is going to think to import math if you want a range of dates?


-- 
Steven


From ethan at stoneleaf.us  Tue Sep 27 18:27:34 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 27 Sep 2011 09:27:34 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E81F7CB.7060001@pearwood.info>
References: <4E7CCA42.2060100@stoneleaf.us>	<4E7D3407.5000207@pearwood.info>	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>	<4E7D3CB8.5050904@pearwood.info>	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>	<j5juq0$fuq$1@dough.gmane.org>	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>	<j5qph4$p0l$1@dough.gmane.org>	<4E81261C.6040200@pearwood.info>	<CAP7h-xYQqRwSOS0fYVP3fsf7CqEEA9g=bpvRn4--rhR=eq_1yw@mail.gmail.com>
	<4E81F7CB.7060001@pearwood.info>
Message-ID: <4E81F976.3070300@stoneleaf.us>

Steven D'Aprano wrote:
> Alexander Belopolsky wrote:
> 
>> In addition to Steven's criticisms of numpy.linspace(), I would like a
>> new function to work with types other than float.  It certainly makes
>> sense to have range-like functionality for fractions and decimal
>> floats, but also I often find a need to generate a list of equally
>> spaces dates or datetime points.  It would be nice if a new function
>> would allow start and stop to be any type that supports subtraction
>> and whose differences support division by numbers.
> 
> I think a polymorphic numeric range function would be useful. If it 
> happened to support dates, that would be great, but I think that a 
> daterange() function in the datetime module would be more appropriate. 
> Who is going to think to import math if you want a range of dates?

If it's generic, why should it live in math?

~Ethan~

From alexander.belopolsky at gmail.com  Tue Sep 27 18:36:52 2011
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Tue, 27 Sep 2011 12:36:52 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E81F7CB.7060001@pearwood.info>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7h-xYQqRwSOS0fYVP3fsf7CqEEA9g=bpvRn4--rhR=eq_1yw@mail.gmail.com>
	<4E81F7CB.7060001@pearwood.info>
Message-ID: <CAP7h-xbx5Ux4Z8bYQ+rBsbyTza4toP3_PAku8tyQ6t-nAh0Z_w@mail.gmail.com>

On Tue, Sep 27, 2011 at 12:20 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> If it happened
> to support dates, that would be great, but I think that a daterange()
> function in the datetime module would be more appropriate.

Or even more appropriately in the calendar module.  The problem is
that we may already have a similar function there and nobody knows
about it.

> Who is going to
> think to import math if you want a range of dates?

No one.  That's why I said that if the new function ends up in math or
stats, I am +1 on frange().  However, I did in the past try to give
dates for start and stop and a timedelta for step expecting range() to
work.  This would be similar to the way sum works for non-numeric
types when an appropriate start value is given.

BTW, at the time when I worked on extending (x)range to long integers,
I attempted to make it work on dates, but at that time timedelta did
not support division by integer, so I refocused on that instead.

From guido at python.org  Tue Sep 27 19:03:38 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Sep 2011 10:03:38 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
Message-ID: <CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>

On Tue, Sep 27, 2011 at 9:16 AM, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:
> On Tue, Sep 27, 2011 at 11:44 AM, Raymond Hettinger
> <raymond.hettinger at gmail.com> wrote:
> ..
>> In the math module, we used an f prefix to differentiate math.fsum() from the built-in sum() function. ?That suggests frange() as a possible name for a variant of range() that creates floats.
>>
>> That works reasonably well if the default argument pattern is the same as range: ? frange(10.0, 20.0, 0.5)
>
> +1 on adding frange() to math module or to the recently contemplated
> stats module. ? For something that aspires to becoming a builtin one
> day, I would like to see something not focused on floats exclusively
> and something with a proper English name.

Um, I think you better read the thread. :-) I successfully argued that
mimicking the behavior of range() for floats is a bad idea, and that
we need to come up with a name for an API that takes start/stop/count
arguments instead of start/stop/step.

-- 
--Guido van Rossum (python.org/~guido)

From ubershmekel at gmail.com  Tue Sep 27 19:07:44 2011
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Tue, 27 Sep 2011 13:07:44 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
Message-ID: <CANSw7KyyrVLh_qXFMVLMQcj0H8dtuwyzPwYHUAu-Vt9WTKCERA@mail.gmail.com>

I as well think the construct should support other types as it sounds an
awful lot like the missing for(;;) loop construct.

Concerning the api, if we use spread(start, step, count) we don't rely on a
division method even though the caller probably does. Just mentioning
another option.

--Yuval Greenfield
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110927/d9ca994b/attachment-0001.html>

From alexander.belopolsky at gmail.com  Tue Sep 27 19:11:10 2011
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Tue, 27 Sep 2011 13:11:10 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
Message-ID: <CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>

On Tue, Sep 27, 2011 at 1:03 PM, Guido van Rossum <guido at python.org> wrote:
..
> Um, I think you better read the thread. :-) I successfully argued that
> mimicking the behavior of range() for floats is a bad idea, and that
> we need to come up with a name for an API that takes start/stop/count
> arguments instead of start/stop/step.

The name "frange" does not necessarily imply that we have to mimic the
API completely.  As long as frange(10.0) and frange(1.0, 10.0) works
as expected while addressing floating point subtleties through
optional arguments and documentation, I don't see why it can't be
called frange() *and* support count.

From guido at python.org  Tue Sep 27 19:21:31 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Sep 2011 10:21:31 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
Message-ID: <CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>

On Tue, Sep 27, 2011 at 10:11 AM, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:
> On Tue, Sep 27, 2011 at 1:03 PM, Guido van Rossum <guido at python.org> wrote:
> ..
>> Um, I think you better read the thread. :-) I successfully argued that
>> mimicking the behavior of range() for floats is a bad idea, and that
>> we need to come up with a name for an API that takes start/stop/count
>> arguments instead of start/stop/step.
>
> The name "frange" does not necessarily imply that we have to mimic the
> API completely. ?As long as frange(10.0) and frange(1.0, 10.0) works
> as expected while addressing floating point subtleties through
> optional arguments and documentation, I don't see why it can't be
> called frange() *and* support count.

But I do. :-) Calling it frange() is pretty much *begging* people to
assume that the 3rd parameter has the same meaning as for range().
Now, there are a few cases where that doesn't matter, e.g. frange(0,
100, 10) will do the expected thing under both interpretations, but
frange(0, 100, 5) will not.

-- 
--Guido van Rossum (python.org/~guido)

From ethan at stoneleaf.us  Tue Sep 27 19:32:13 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 27 Sep 2011 10:32:13 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<4E7D3407.5000207@pearwood.info>	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>	<4E7D3CB8.5050904@pearwood.info>	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>	<j5juq0$fuq$1@dough.gmane.org>	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>	<j5qph4$p0l$1@dough.gmane.org>
	<4E81261C.6040200@pearwood.info>	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>	<4E816BED.4000103@canterbury.ac.nz>	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>	<4E81EAA0.5080507@stoneleaf.us>	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
Message-ID: <4E82089D.8070209@stoneleaf.us>

Guido van Rossum wrote:
> On Tue, Sep 27, 2011 at 10:11 AM, Alexander Belopolsky wrote:
>> The name "frange" does not necessarily imply that we have to mimic the
>> API completely.  As long as frange(10.0) and frange(1.0, 10.0) works
>> as expected while addressing floating point subtleties through
>> optional arguments and documentation, I don't see why it can't be
>> called frange() *and* support count.
> 
> But I do. :-) Calling it frange() is pretty much *begging* people to
> assume that the 3rd parameter has the same meaning as for range().
> Now, there are a few cases where that doesn't matter, e.g. frange(0,
> 100, 10) will do the expected thing under both interpretations, but
> frange(0, 100, 5) will not.


What about the idea of this signature?

frange([start], stop, step=None, count=None)

Then when count is desired, it can be specified, and when step is 
sufficient, no change is necessary.

~Ethan~

From steve at pearwood.info  Tue Sep 27 19:55:15 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 28 Sep 2011 03:55:15 +1000
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E82089D.8070209@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>	<4E7D3407.5000207@pearwood.info>	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>	<4E7D3CB8.5050904@pearwood.info>	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>	<j5juq0$fuq$1@dough.gmane.org>	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>	<j5qph4$p0l$1@dough.gmane.org>	<4E81261C.6040200@pearwood.info>	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>	<4E816BED.4000103@canterbury.ac.nz>	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>	<4E81EAA0.5080507@stoneleaf.us>	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us>
Message-ID: <4E820E03.6090100@pearwood.info>

Ethan Furman wrote:

> What about the idea of this signature?
> 
> frange([start], stop, step=None, count=None)
> 
> Then when count is desired, it can be specified, and when step is 
> sufficient, no change is necessary.

A default of start=0 makes sense for integer range, because the most 
common use for range *by far* is for counting, and in Python we count 0, 
1, 2, ... Similarly, we usually count every item, so a default step of 1 
is useful.

But for numeric work, neither of those defaults are useful. This 
proposed spread/frange/whatever function will be used for generating a 
sequence of equally spaced numbers, and not for counting. A starting 
value of 0.0 is generally no more special than any other starting value. 
There is no good reason to single out default start=0. Likewise a 
step-size of 1.0 is also arbitrary.

It isn't useful to hammer the square peg of numeric ranges into the 
round hole of integer counts. We should not try to force this float 
range to use the same API as builtin range.

(In hindsight, it is a shame that range is called "range" instead of 
"count". itertools got the name right.)


-- 
Steven


From ethan at stoneleaf.us  Tue Sep 27 20:20:25 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 27 Sep 2011 11:20:25 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E820E03.6090100@pearwood.info>
References: <4E7CCA42.2060100@stoneleaf.us>	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>	<4E7D3CB8.5050904@pearwood.info>	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>	<j5juq0$fuq$1@dough.gmane.org>	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>	<j5qph4$p0l$1@dough.gmane.org>	<4E81261C.6040200@pearwood.info>	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>	<4E816BED.4000103@canterbury.ac.nz>	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>	<4E81EAA0.5080507@stoneleaf.us>	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>	<4E82089D.8070209@stoneleaf.us>
	<4E820E03.6090100@pearwood.info>
Message-ID: <4E8213E9.1060705@stoneleaf.us>

Steven D'Aprano wrote:
> Ethan Furman wrote:
> 
>> What about the idea of this signature?
>>
>> frange([start], stop, step=None, count=None)
>>
>> Then when count is desired, it can be specified, and when step is 
>> sufficient, no change is necessary.
> 
> A default of start=0 makes sense for integer range, because the most 
> common use for range *by far* is for counting, and in Python we count 0, 
> 1, 2, ... Similarly, we usually count every item, so a default step of 1 
> is useful.
> 
> But for numeric work, neither of those defaults are useful. This 
> proposed spread/frange/whatever function will be used for generating a 
> sequence of equally spaced numbers, and not for counting. A starting 
> value of 0.0 is generally no more special than any other starting value. 
> There is no good reason to single out default start=0. Likewise a 
> step-size of 1.0 is also arbitrary.
> 
> It isn't useful to hammer the square peg of numeric ranges into the 
> round hole of integer counts. We should not try to force this float 
> range to use the same API as builtin range.
> 
> (In hindsight, it is a shame that range is called "range" instead of 
> "count". itertools got the name right.)

Good points.  So how about:

some_name_here(start, stop, *, step=None, count=None)

?  I personally would use the step value far more often than the count 
value.

~Ethan~

From guido at python.org  Tue Sep 27 20:36:08 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Sep 2011 11:36:08 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E8213E9.1060705@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
Message-ID: <CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>

On Tue, Sep 27, 2011 at 11:20 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
>I personally would use the step value far more often than the count
> value.

But that's exactly what we don't *want* you to do! Because (unless you
are a numerical wizard) you probably aren't doing the error analysis
needed to avoid the "unexpected extra point" problem due to floating
point inaccuracies. For your own good, we want you to state the count
and let us deliver the number of points you want.

-- 
--Guido van Rossum (python.org/~guido)

From alexander.belopolsky at gmail.com  Tue Sep 27 20:38:48 2011
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Tue, 27 Sep 2011 14:38:48 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E8213E9.1060705@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
Message-ID: <CAP7h-xZHvyKh2rYTa9j4e-vXLz8O1r7Fgq9bVOmj9KiHagEOMA@mail.gmail.com>

On Tue, Sep 27, 2011 at 2:20 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
..
> Good points. ?So how about:
>
> some_name_here(start, stop, *, step=None, count=None)
>

+1

The unusual optional first arguments is one of the things I dislike
about range().  Shouldn't step default to 1.0?  Also, when count is
given, stop can be elided.  This will make for a nice symmetry:
between stop, step and count any two can be provided but stop+step may
be problematic and we can warn about this choice in the docs.

From alexander.belopolsky at gmail.com  Tue Sep 27 20:48:51 2011
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Tue, 27 Sep 2011 14:48:51 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
	<CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
Message-ID: <CAP7h-xZo4BuayrE-ZevpzSjzLu0se1dsrnh492PwA95SkbtARQ@mail.gmail.com>

On Tue, Sep 27, 2011 at 2:36 PM, Guido van Rossum <guido at python.org> wrote:
..
> But that's exactly what we don't *want* you to do! Because (unless you
> are a numerical wizard) you probably aren't doing the error analysis
> needed to avoid the "unexpected extra point" problem due to floating
> point inaccuracies. For your own good, we want you to state the count
> and let us deliver the number of points you want.

But the likely result will be that a non-wizard will find that range()
does not work with floats, reach for some_name_here(), find the
absence of step option, curse the developers, write
count=int((stop-start)/step) and leave this with a nagging thought
that (s)he forgot +/-1 somewhere.

From guido at python.org  Tue Sep 27 20:53:39 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Sep 2011 11:53:39 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7h-xZo4BuayrE-ZevpzSjzLu0se1dsrnh492PwA95SkbtARQ@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
	<CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
	<CAP7h-xZo4BuayrE-ZevpzSjzLu0se1dsrnh492PwA95SkbtARQ@mail.gmail.com>
Message-ID: <CAP7+vJ+49Ve_Ec9=bkVzMzaTFtzipgyPLpZL6fA3kjRU=E51jg@mail.gmail.com>

On Tue, Sep 27, 2011 at 11:48 AM, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:
> On Tue, Sep 27, 2011 at 2:36 PM, Guido van Rossum <guido at python.org> wrote:
> ..
>> But that's exactly what we don't *want* you to do! Because (unless you
>> are a numerical wizard) you probably aren't doing the error analysis
>> needed to avoid the "unexpected extra point" problem due to floating
>> point inaccuracies. For your own good, we want you to state the count
>> and let us deliver the number of points you want.
>
> But the likely result will be that a non-wizard will find that range()
> does not work with floats, reach for some_name_here(), find the
> absence of step option, curse the developers, write
> count=int((stop-start)/step) and leave this with a nagging thought
> that (s)he forgot +/-1 somewhere.

But the *user* can just force this to round by using
int((stop-start+0.5)/step) or by using int(round()); either of these
is an easy pattern to teach and learn and useful in many other places.

The problem is that frange() cannot do that rounding for you, since
its contract (if it is to be analogous to range() at all) is that
there is no assumption that stop is anywhere close to start + a
multiple of step.

-- 
--Guido van Rossum (python.org/~guido)

From wilfred at potatolondon.com  Tue Sep 27 20:46:52 2011
From: wilfred at potatolondon.com (Wilfred Hughes)
Date: Tue, 27 Sep 2011 19:46:52 +0100
Subject: [Python-Dev] unittest missing assertNotRaises
Message-ID: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>

Hi folks

I wasn't sure if this warranted a bug in the tracker, so I thought I'd raise
it here first.

unittest has assertIn, assertNotIn, assertEqual, assertNotEqual and so on.
So, it seems odd to me that there isn't assertNotRaises. Is there any
particular motivation for not putting it in?

I've attached a simple patch against Python 3's trunk to give an idea of
what I have in mind.

Thanks
Wilfred
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110927/8312b662/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: assert_not_raises.diff
Type: text/x-patch
Size: 925 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110927/8312b662/attachment.bin>

From _ at lvh.cc  Tue Sep 27 20:59:37 2011
From: _ at lvh.cc (Laurens Van Houtven)
Date: Tue, 27 Sep 2011 20:59:37 +0200
Subject: [Python-Dev] unittest missing assertNotRaises
In-Reply-To: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
References: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
Message-ID: <CAE_Hg6a_S1aFmKM-qX9y5tPEW9rLZSsCz8PQzuf4RCxn9MN6LA@mail.gmail.com>

Sure, you just *do* it. The only advantage I see in assertNotRaises is that
when that exception is raised, you should (and would) get a failure, not an
error.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110927/10bd60ab/attachment.html>

From phd at phdru.name  Tue Sep 27 21:05:32 2011
From: phd at phdru.name (Oleg Broytman)
Date: Tue, 27 Sep 2011 23:05:32 +0400
Subject: [Python-Dev] unittest missing assertNotRaises
In-Reply-To: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
References: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
Message-ID: <20110927190532.GA32171@iskra.aviel.ru>

On Tue, Sep 27, 2011 at 07:46:52PM +0100, Wilfred Hughes wrote:
> +    def assertNotRaises(self, excClass, callableObj=None, *args, **kwargs):
> +        """Fail if an exception of class excClass is thrown by
> +        callableObj when invoked with arguments args and keyword
> +        arguments kwargs.
> +        
> +        """
> +        try:
> +            callableObj(*args, **kwargs)
> +        except excClass:
> +            raise self.failureException("%s was raised" % excClass)
> +            
> +

   What if I want to assert my test raises neither OSError nor IOError?

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From ethan at stoneleaf.us  Tue Sep 27 21:22:20 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 27 Sep 2011 12:22:20 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
	<CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
Message-ID: <4E82226C.7080401@stoneleaf.us>

Guido van Rossum wrote:
> On Tue, Sep 27, 2011 at 11:20 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> I personally would use the step value far more often than the count
>> value.
> 
> But that's exactly what we don't *want* you to do! Because (unless you
> are a numerical wizard) you probably aren't doing the error analysis
> needed to avoid the "unexpected extra point" problem due to floating
> point inaccuracies. For your own good, we want you to state the count
> and let us deliver the number of points you want.

Well, actually, I'd be using it with dates.  ;)

~Ethan~

From tjreedy at udel.edu  Tue Sep 27 21:43:26 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 27 Sep 2011 15:43:26 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
Message-ID: <j5t912$kf6$1@dough.gmane.org>

On 9/27/2011 1:03 PM, Guido van Rossum wrote:

> mimicking the behavior of range() for floats is a bad idea, and that
> we need to come up with a name for an API that takes start/stop/count
> arguments instead of start/stop/step.

[In the following, I use count as the number of intervals; the number of 
points is 1 more.]

I agree with others that we should not just have a floatrange. An 
exact-as-possible floatrange is trivially based on exact computations 
with fractions:

def floatrange(a, b, n):
     '''Yield floats a, b, and n-1 equally spaced floats in between.'''
     for num,dem in fracrange(a.as_integer_ratio(), 
b.as_integer_ratio(), n):
         yield num/dem

There are good reasons to expose the latter. If fracrange is done with 
the Fraction class, each ratio will be reduced to lowest terms, which 
means that the denominator will vary for each pair. In some situations, 
one might prefer a constant denominator across the series. Once a 
constant denominator is calculated (eash though not trivial), fracrange 
is trivially based on range. The following makes the denominator as 
small as possible if the inputs are in lowest terms:

def fracrange(frac1, frac2, n):
     '''Yield fractions frac1, frac2 and n-1 equally spaced fractions in 
between.
     Fractions are represented as (numerator, denominator > 0)  pairs.
     For output, use the smallest common denominator of the inputs
     that makes the numerator range an even multiple of n.
     '''
     n1, d1 = frac1
     n2, d2 = frac2
     dem = d1 * d2 // gcd(d1, d2)
     start = n1 * (dem // d1)
     stop = n2 * (dem // d2)
     rang = stop - start
     q, r = divmod(rang, n)
     if r:
         gcd_r_n = gcd(r, n)
         m =  n // gcd_r_n
         dem *= m
         start *= m
         stop *= m
         step  = rang // gcd_r_n  # rang * m // n
     else:
         step = q   # if r==0: gcd(r,n)==n, m==1, rang//n == q
     for num in range(start, stop+1, step):
         yield num,dem

Two example uses:
for i,j in fracrange((1,10), (22,10), 7): print(i,j)
print()
for i,j in fracrange((1,5), (1,1), 6): print(i,j)
## print

1 10
4 10
7 10
10 10
13 10
16 10
19 10
22 10

3 15
5 15
7 15
9 15
11 15
13 15
15 15

If nothing else, the above is easy to check for correctness ;-).
Note that for fraction output, one will normally want to be able to 
enter an explicit pair such as (1,5) or even (2,10) The decimal 
equivalent, .2, after conversion to float, gets converted by 
.as_integer_ratio() back to  (3602879701896397, 18014398509481984).

-- 
Terry Jan Reedy


From guido at python.org  Tue Sep 27 21:52:01 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Sep 2011 12:52:01 -0700
Subject: [Python-Dev] PEP 393 close to pronouncement
In-Reply-To: <201109271550.27837.victor.stinner@haypocalc.com>
References: <CAP7+vJLDK10VfQ+KRXDAMrNaRV+-1W76SdBGQNd9GHAbiuPOwA@mail.gmail.com>
	<201109270019.02442.victor.stinner@haypocalc.com>
	<201109271550.27837.victor.stinner@haypocalc.com>
Message-ID: <CAP7+vJLsogEO5ERjUmPWbj1GewKFrcEh-v5DDLqgX+oJYvJFLA@mail.gmail.com>

Given the feedback so far, I am happy to pronounce PEP 393 as
accepted. Martin, congratulations! Go ahead and mark ity as Accepted.
(But please do fix up the small nits that Victor reported in his
earlier message.)

-- 
--Guido van Rossum (python.org/~guido)

From alexander.belopolsky at gmail.com  Tue Sep 27 22:05:01 2011
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Tue, 27 Sep 2011 16:05:01 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJ+49Ve_Ec9=bkVzMzaTFtzipgyPLpZL6fA3kjRU=E51jg@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
	<CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
	<CAP7h-xZo4BuayrE-ZevpzSjzLu0se1dsrnh492PwA95SkbtARQ@mail.gmail.com>
	<CAP7+vJ+49Ve_Ec9=bkVzMzaTFtzipgyPLpZL6fA3kjRU=E51jg@mail.gmail.com>
Message-ID: <CAP7h-xaRuLaQohdS=Ke4T63NUTA7z9koiYYiXbS238c776qPTA@mail.gmail.com>

On Tue, Sep 27, 2011 at 2:53 PM, Guido van Rossum <guido at python.org> wrote:
> On Tue, Sep 27, 2011 at 11:48 AM, Alexander Belopolsky
> <alexander.belopolsky at gmail.com> wrote:
>> On Tue, Sep 27, 2011 at 2:36 PM, Guido van Rossum <guido at python.org> wrote:
>> ..
>>> But that's exactly what we don't *want* you to do! Because (unless you
>>> are a numerical wizard) you probably aren't doing the error analysis
>>> needed to avoid the "unexpected extra point" problem due to floating
>>> point inaccuracies. For your own good, we want you to state the count
>>> and let us deliver the number of points you want.

I don't disagree that the ability to provide count= option is useful.
I am just saying that there are also cases where float step is known
exactly and count (or stop) can be deduced from stop (or count)
without any floating point issues.  Iteration over integers that
happen to be represented by floats is one use case, but using integer
range may be a better option in this case.  In US it is still popular
to measure things in power of two fractions.  Simulating a carpenter's
yard does not suffer from rounding when done in floats.  Counting by
.5 and .25 has its uses too.  Maybe frange() should just signal the FP
inexact exception if we expect users to need hand holding to such a
degree.

From tjreedy at udel.edu  Tue Sep 27 22:06:22 2011
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 27 Sep 2011 16:06:22 -0400
Subject: [Python-Dev] unittest missing assertNotRaises
In-Reply-To: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
References: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
Message-ID: <j5tac2$tj4$1@dough.gmane.org>

On 9/27/2011 2:46 PM, Wilfred Hughes wrote:
> Hi folks
>
> I wasn't sure if this warranted a bug in the tracker, so I thought I'd
> raise it here first.
>
> unittest has assertIn, assertNotIn, assertEqual, assertNotEqual and so

These all test possible specification conditions and sensible test 
conditions. For instance -1 and particularly 3 should not be in 
range(3). Including 3 is a realistic possible error. If you partition a 
set into subsets < x and > x, x should not be in either, but an easy 
mistake would put it in either or both.

> Is there any particular motivation for not putting it in?

You have 'motivation' backwards. There are an infinity of things we 
could add. We need a positive, substantial reason with real use cases to 
add something.

An expression should return a particular value or return a particular 
expression. If it returns a value, testing that it is the correct value 
eliminates all exceptions. And testing for an expected exception 
eliminates all others. If there is an occasional needed for the 
proposal, one can write the same code you did, but with the possibility 
of excluding more than one exception. So I do not see any need for the 
proposal.

-- 
Terry Jan Reedy


From ericsnowcurrently at gmail.com  Tue Sep 27 22:12:52 2011
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Tue, 27 Sep 2011 14:12:52 -0600
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E8213E9.1060705@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
Message-ID: <CALFfu7ASpg4Hg3oNHz7Or0Ak7q-Boj76-hrUbXpadAbNXnxkgg@mail.gmail.com>

On Tue, Sep 27, 2011 at 12:20 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> Good points. ?So how about:
>
> some_name_here(start, stop, *, step=None, count=None)
>
> ? ?I personally would use the step value far more often than the count
> value.

Let's call it xrange() or maybe range_ex().  <wink>  But seriously,
here's an approach that extends the generic replacement idea a bit.

I like the idea of the "some_name_here" function as a builtin in
conjunction with Alexander's idea of a generic function, a la len() or
repr().  Like those other builtin generic functions, it would leverage
special methods (whether new or existing) to use the "range protocol"
of objects.

The builtin would either replace range() (and assume its name) or be a
new builtin with a parallel name to range().  Either way, it would
return an object of the new/refactored range type, which would reflect
the above signature.

If the new builtin were to rely on a new range-related protocol (i.e.
if it were needed), that protocol could distinguish support for
stepping from support for counting.  Then floats could simply not
support the stepping portion.

And the fate of range()?

As far as the existing builtin range() goes, either we would leave it
alone, we would make range() a wrapper function around a new range
type, or the new range type would completely replace the old.  If we
were to leave it alone, the new builtin would have a name that
parallels the old name.  Then we wouldn't have to worry about backward
compatibility for performance, type, or signature.

Going the wrapper function route would preserve backward compatibility
for the function signature, but  isinstance(obj, range) wouldn't work
anymore.  Whether leaving range() alone or making it a wrapper, we
could replace it with the new builtin in Python 4, if it made sense
(like happened with xrange).

If we entirely replaced the current range() with the new (more
generic) range type, the biggest concern is maintaining backward
compatibility with the function signature, in both Python and the
C-API.  That would be tricky since the above signature seems
incompatible with the current one.

-eric

>
> ~Ethan~
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com
>

From guido at python.org  Tue Sep 27 22:13:41 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Sep 2011 13:13:41 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7h-xaRuLaQohdS=Ke4T63NUTA7z9koiYYiXbS238c776qPTA@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
	<CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
	<CAP7h-xZo4BuayrE-ZevpzSjzLu0se1dsrnh492PwA95SkbtARQ@mail.gmail.com>
	<CAP7+vJ+49Ve_Ec9=bkVzMzaTFtzipgyPLpZL6fA3kjRU=E51jg@mail.gmail.com>
	<CAP7h-xaRuLaQohdS=Ke4T63NUTA7z9koiYYiXbS238c776qPTA@mail.gmail.com>
Message-ID: <CAP7+vJJVGDFQGxp489Fx=qub32fhA4YeqqPjTg6V_-D0ZVNb2w@mail.gmail.com>

On Tue, Sep 27, 2011 at 1:05 PM, Alexander Belopolsky
<alexander.belopolsky at gmail.com> wrote:
> On Tue, Sep 27, 2011 at 2:53 PM, Guido van Rossum <guido at python.org> wrote:
>> On Tue, Sep 27, 2011 at 11:48 AM, Alexander Belopolsky
>> <alexander.belopolsky at gmail.com> wrote:
>>> On Tue, Sep 27, 2011 at 2:36 PM, Guido van Rossum <guido at python.org> wrote:
>>> ..
>>>> But that's exactly what we don't *want* you to do! Because (unless you
>>>> are a numerical wizard) you probably aren't doing the error analysis
>>>> needed to avoid the "unexpected extra point" problem due to floating
>>>> point inaccuracies. For your own good, we want you to state the count
>>>> and let us deliver the number of points you want.
>
> I don't disagree that the ability to provide count= option is useful.
> I am just saying that there are also cases where float step is known
> exactly and count (or stop) can be deduced from stop (or count)
> without any floating point issues. ?Iteration over integers that
> happen to be represented by floats is one use case, but using integer
> range may be a better option in this case. ?In US it is still popular
> to measure things in power of two fractions. ?Simulating a carpenter's
> yard does not suffer from rounding when done in floats. ?Counting by
> .5 and .25 has its uses too. ?Maybe frange() should just signal the FP
> inexact exception if we expect users to need hand holding to such a
> degree.

But why offer an API that is an attractive nuisance? I don't think
that it is a burden to the user to have to specify "from 0 to 2 inches
in 8 steps" instead of "from 0 to 2 inches in 1/4 inch steps". (And
what if they tried to say "from 0 to 3 1/4 inches in 1/2 inch steps"
?)

-- 
--Guido van Rossum (python.org/~guido)

From guido at python.org  Tue Sep 27 22:16:06 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Sep 2011 13:16:06 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CALFfu7ASpg4Hg3oNHz7Or0Ak7q-Boj76-hrUbXpadAbNXnxkgg@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
	<CALFfu7ASpg4Hg3oNHz7Or0Ak7q-Boj76-hrUbXpadAbNXnxkgg@mail.gmail.com>
Message-ID: <CAP7+vJ+5UMpfC8UJh=3__C25gdJb7W_WGsRj_WWvVmdHK=6hAA@mail.gmail.com>

On Tue, Sep 27, 2011 at 1:12 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Tue, Sep 27, 2011 at 12:20 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> Good points. ?So how about:
>>
>> some_name_here(start, stop, *, step=None, count=None)
>>
>> ? ?I personally would use the step value far more often than the count
>> value.
>
> Let's call it xrange() or maybe range_ex(). ?<wink> ?But seriously,
> here's an approach that extends the generic replacement idea a bit.
>
> I like the idea of the "some_name_here" function as a builtin in
> conjunction with Alexander's idea of a generic function, a la len() or
> repr(). ?Like those other builtin generic functions, it would leverage
> special methods (whether new or existing) to use the "range protocol"
> of objects.
>
> The builtin would either replace range() (and assume its name) or be a
> new builtin with a parallel name to range(). ?Either way, it would
> return an object of the new/refactored range type, which would reflect
> the above signature.
>
> If the new builtin were to rely on a new range-related protocol (i.e.
> if it were needed), that protocol could distinguish support for
> stepping from support for counting. ?Then floats could simply not
> support the stepping portion.

This sound like a rather over-designed API.

> And the fate of range()?
>
> As far as the existing builtin range() goes, either we would leave it
> alone, we would make range() a wrapper function around a new range
> type, or the new range type would completely replace the old. ?If we
> were to leave it alone, the new builtin would have a name that
> parallels the old name. ?Then we wouldn't have to worry about backward
> compatibility for performance, type, or signature.
>
> Going the wrapper function route would preserve backward compatibility
> for the function signature, but ?isinstance(obj, range) wouldn't work
> anymore. ?Whether leaving range() alone or making it a wrapper, we
> could replace it with the new builtin in Python 4, if it made sense
> (like happened with xrange).
>
> If we entirely replaced the current range() with the new (more
> generic) range type, the biggest concern is maintaining backward
> compatibility with the function signature, in both Python and the
> C-API. ?That would be tricky since the above signature seems
> incompatible with the current one.
>
> -eric
>
>>
>> ~Ethan~
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com
>>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (python.org/~guido)

From ethan at stoneleaf.us  Tue Sep 27 22:21:52 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 27 Sep 2011 13:21:52 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJJVGDFQGxp489Fx=qub32fhA4YeqqPjTg6V_-D0ZVNb2w@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
	<CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
	<CAP7h-xZo4BuayrE-ZevpzSjzLu0se1dsrnh492PwA95SkbtARQ@mail.gmail.com>
	<CAP7+vJ+49Ve_Ec9=bkVzMzaTFtzipgyPLpZL6fA3kjRU=E51jg@mail.gmail.com>
	<CAP7h-xaRuLaQohdS=Ke4T63NUTA7z9koiYYiXbS238c776qPTA@mail.gmail.com>
	<CAP7+vJJVGDFQGxp489Fx=qub32fhA4YeqqPjTg6V_-D0ZVNb2w@mail.gmail.com>
Message-ID: <4E823060.3070805@stoneleaf.us>

Guido van Rossum wrote:
> But why offer an API that is an attractive nuisance? I don't think
> that it is a burden to the user to have to specify "from 0 to 2 inches
> in 8 steps" instead of "from 0 to 2 inches in 1/4 inch steps". (And
> what if they tried to say "from 0 to 3 1/4 inches in 1/2 inch steps"
> ?)

And how many steps in "from 37 3/4 inches to 90 1/4 inches" ?  I don't 
want to have to calculate that.  That's what computers are for.

Your last example is no different than today's range(2, 10, 3) -- we 
don't get 10 or 9.

~Ethan~

From greg.ewing at canterbury.ac.nz  Tue Sep 27 23:16:12 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 Sep 2011 10:16:12 +1300
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7h-xYQqRwSOS0fYVP3fsf7CqEEA9g=bpvRn4--rhR=eq_1yw@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7h-xYQqRwSOS0fYVP3fsf7CqEEA9g=bpvRn4--rhR=eq_1yw@mail.gmail.com>
Message-ID: <4E823D1C.3080102@canterbury.ac.nz>

Alexander Belopolsky wrote:
> I don't think we'll gain anything by
> copying numpy code because linspace(start, stop, num) is effectively
> just
> 
> arange(0, num) * step + start

I don't think the intention was to literally copy the
code, but to investigate borrowing the algorithm, in case
it was using some special technique to maximise numerical
accuracy.

But from this it seems like it's just using the naive
algorithm that we've already decided is not the best.

-- 
Greg

From guido at python.org  Tue Sep 27 23:16:49 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Sep 2011 14:16:49 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E823060.3070805@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us> <4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
	<CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
	<CAP7h-xZo4BuayrE-ZevpzSjzLu0se1dsrnh492PwA95SkbtARQ@mail.gmail.com>
	<CAP7+vJ+49Ve_Ec9=bkVzMzaTFtzipgyPLpZL6fA3kjRU=E51jg@mail.gmail.com>
	<CAP7h-xaRuLaQohdS=Ke4T63NUTA7z9koiYYiXbS238c776qPTA@mail.gmail.com>
	<CAP7+vJJVGDFQGxp489Fx=qub32fhA4YeqqPjTg6V_-D0ZVNb2w@mail.gmail.com>
	<4E823060.3070805@stoneleaf.us>
Message-ID: <CAP7+vJJCdVF5Lwe9vwjshGmt4tpcujEjF3zcsEebaikG_gcrKg@mail.gmail.com>

On Tue, Sep 27, 2011 at 1:21 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> Guido van Rossum wrote:
>>
>> But why offer an API that is an attractive nuisance? I don't think
>> that it is a burden to the user to have to specify "from 0 to 2 inches
>> in 8 steps" instead of "from 0 to 2 inches in 1/4 inch steps". (And
>> what if they tried to say "from 0 to 3 1/4 inches in 1/2 inch steps"
>> ?)
>
> And how many steps in "from 37 3/4 inches to 90 1/4 inches" ? ?I don't want
> to have to calculate that. ?That's what computers are for.

That's just silly. The number of steps is (stop - start) / step.

> Your last example is no different than today's range(2, 10, 3) -- we don't
> get 10 or 9.

The difference is that most operations on integers, by their nature,
give give exact results, except for division (which is defined as
producing a float in Python 3). Whether float operations give exact
results or not is a lot harder to know, and the various IEEE states
are hard to access.

Just because the US measurement system happens to use only values that
are exactly representable as floats doesn't mean floats are great to
represent measurements. (What if you have to cut a length of string in
three equal pieces?)

-- 
--Guido van Rossum (python.org/~guido)

From greg.ewing at canterbury.ac.nz  Tue Sep 27 23:19:15 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 Sep 2011 10:19:15 +1300
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
Message-ID: <4E823DD3.1030000@canterbury.ac.nz>

Alexander Belopolsky wrote:
> ("Comb" (noun) brings up the right image, but is probably too
> informal  and may be confused with a short for "combination.")

And also with "comb filter" for those who are into
signal processing.

-- 
Greg


From greg.ewing at canterbury.ac.nz  Tue Sep 27 23:39:18 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 Sep 2011 10:39:18 +1300
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E81F976.3070300@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7h-xYQqRwSOS0fYVP3fsf7CqEEA9g=bpvRn4--rhR=eq_1yw@mail.gmail.com>
	<4E81F7CB.7060001@pearwood.info> <4E81F976.3070300@stoneleaf.us>
Message-ID: <4E824286.9060000@canterbury.ac.nz>

Ethan Furman wrote:

> If it's generic, why should it live in math?

Generic? Maybe that's it: grange()

It's also an English word, unfortunately one with a
completely unrelated meaning. :-(

-- 
Greg

From ethan at stoneleaf.us  Tue Sep 27 23:39:43 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 27 Sep 2011 14:39:43 -0700
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <CAP7+vJJCdVF5Lwe9vwjshGmt4tpcujEjF3zcsEebaikG_gcrKg@mail.gmail.com>
References: <4E7CCA42.2060100@stoneleaf.us> <4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
	<CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
	<CAP7h-xZo4BuayrE-ZevpzSjzLu0se1dsrnh492PwA95SkbtARQ@mail.gmail.com>
	<CAP7+vJ+49Ve_Ec9=bkVzMzaTFtzipgyPLpZL6fA3kjRU=E51jg@mail.gmail.com>
	<CAP7h-xaRuLaQohdS=Ke4T63NUTA7z9koiYYiXbS238c776qPTA@mail.gmail.com>
	<CAP7+vJJVGDFQGxp489Fx=qub32fhA4YeqqPjTg6V_-D0ZVNb2w@mail.gmail.com>
	<4E823060.3070805@stoneleaf.us>
	<CAP7+vJJCdVF5Lwe9vwjshGmt4tpcujEjF3zcsEebaikG_gcrKg@mail.gmail.com>
Message-ID: <4E82429F.50105@stoneleaf.us>

Guido van Rossum wrote:
> On Tue, Sep 27, 2011 at 1:21 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> Guido van Rossum wrote:
>>> But why offer an API that is an attractive nuisance? I don't think
>>> that it is a burden to the user to have to specify "from 0 to 2 inches
>>> in 8 steps" instead of "from 0 to 2 inches in 1/4 inch steps". (And
>>> what if they tried to say "from 0 to 3 1/4 inches in 1/2 inch steps"
>>> ?)
>> And how many steps in "from 37 3/4 inches to 90 1/4 inches" ?  I don't want
>> to have to calculate that.  That's what computers are for.
> 
> That's just silly. The number of steps is (stop - start) / step.

Not silly at all -- it begs for an api of (start, stop, step), not 
(start, stop, count).

Personally, I have no problems with typing either 'step=...' or 
'stop=...', but I think losing step as an option is a *ahem* step backwards.

~Ethan~

From tseaver at palladion.com  Tue Sep 27 23:50:59 2011
From: tseaver at palladion.com (Tres Seaver)
Date: Tue, 27 Sep 2011 17:50:59 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E824286.9060000@canterbury.ac.nz>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7h-xYQqRwSOS0fYVP3fsf7CqEEA9g=bpvRn4--rhR=eq_1yw@mail.gmail.com>
	<4E81F7CB.7060001@pearwood.info> <4E81F976.3070300@stoneleaf.us>
	<4E824286.9060000@canterbury.ac.nz>
Message-ID: <j5tgg3$3t3$1@dough.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/27/2011 05:39 PM, Greg Ewing wrote:
> Ethan Furman wrote:
> 
>> If it's generic, why should it live in math?
> 
> Generic? Maybe that's it: grange()
> 
> It's also an English word, unfortunately one with a completely
> unrelated meaning. :-(

One could always think of the Midwest US farm country, cut into even
one-mile sections by dirt roads, and think of 'grange'. :)


Tres.
- -- 
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6CRUMACgkQ+gerLs4ltQ7EYgCgi/iJqg4Wq8LVF25kd6gS0yN/
MQ4An1kl/+8uBcFzAJPPNPL1iBqSNwJM
=2IUq
-----END PGP SIGNATURE-----


From martin at v.loewis.de  Wed Sep 28 00:56:58 2011
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 28 Sep 2011 00:56:58 +0200
Subject: [Python-Dev] PEP 393 memory savings update
Message-ID: <4E8254BA.6010705@v.loewis.de>

I have redone my memory benchmark, and added a few new
counters.

The application is a very small Django application. The same
source code of the app and Django itself is used on all Python
versions. The full list of results is at

http://www.dcl.hpi.uni-potsdam.de/home/loewis/djmemprof/

Here are some excerpts:

A. 32-bit builds, storage for Unicode objects
3.x, 32-bit wchar_t: 6378540
3.x, 16-bit wchar_t: 3694694
PEP 393:             2216807

Compared to the previous results, there are now some
significant savings even compared to a narrow unicode build.

B. 3.x, number of strings by maxchar:
ASCII:   35713 (1,300,000 chars)
Latin-1: 235   (11,000 chars)
BMP:     260   (700 chars)
other:   0
total:   36,000 (1,310,000 chars)

This explains why the savings for shortening ASCII objects
are significant in this application. I have no good intuition
how this effect would show for "real" applications. It may be
that the percentage of ASCII strings (in number and chars) grows
proportionally with the total number of strings; it may also
be that the majority of these strings is a certain fixed overhead
(resulting from Python identifiers and other interned strings).

C. String-ish objects in 2.7 and 3.3-trunk:
                   2.x         3.x
#unicode           370      36,000
#bytes          43,000      14,000
#total          43,400      50,000

len(unicode)     5,300   1,306,000
len(bytes)   2,040,000     860,000
len(total)   2,046,000   2,200,000

(Note: the computations in the results are slightly messed up:
the number of bytes for bytes objectts is actually the sum
of the lengths, not the sum of the sizeofs; this gets added
in the "total" lines to the sum of sizeofs of unicode strings,
which is non-sensical. The table above corrects this)

As you can see, Python 3 creates more string objects in total.

D. Memory consumption for 2.x, 3.x, PEP 393, accounting both
   unicode and bytes objects, using 32-bit builds and 32-bit
   wchar_t:
2.x:     3,620,000 bytes
3.x:     7,750,000 bytes
PEP 393: 3,340,000 bytes

This suggests that PEP 393 actually reduces memory consumption
below what 2.7 uses. This is offset though by "other" (non-string)
objects, which take 300KB more in 3.x.

Regards,
Martin

From brian.curtin at gmail.com  Wed Sep 28 00:59:12 2011
From: brian.curtin at gmail.com (Brian Curtin)
Date: Tue, 27 Sep 2011 17:59:12 -0500
Subject: [Python-Dev] PyCon 2012 Proposals Due October 12
Message-ID: <CAD+XWwp6MdMo8Nf2CHwFPatbFKkLC5EtQgG4jUr-2+KYMdV3tw@mail.gmail.com>

The deadline for PyCon 2012 tutorial, talk, and poster proposals is
under 15 days away, so be sure to get your submissions in by October
12, 2011. Whether you?re a first-timer or an experienced veteran,
PyCon is depends on you, the community, coming together to build the
best conference schedule possible. Our
call for proposals (http://us.pycon.org/2012/cfp/) lays out the
details it takes to be included in the lineup for the conference in
Santa Clara, CA on March 7-15, 2012.

If you?re unsure of what to write about, our recent survey yielded a
large list of potential talk topics
(http://pycon.blogspot.com/2011/09/need-talk-ideas.html), and plenty
of ideas for tutorials (INSERT TUTORIAL POST). We?ve also come up with
general tips on proposal writing at
http://pycon.blogspot.com/2011/08/writing-good-proposal.html to ensure
everyone has the most complete proposal when it comes time for review.
As always, the program committee wants to put together an incredible
conference, so they?ll be working with submitters to fine tune
proposal details and help you produce the best submissions.

We?ve had plenty of great news to share since we first announced the
call for proposals. Paul Graham of Y Combinator was recently announced
as a keynote speaker
(http://pycon.blogspot.com/2011/09/announcing-first-pycon-2012-keynote.html),
making his return after a 2003 keynote. David Beazley, famous for his
mind-blowing talks on CPython?s Global Interpreter Lock, was added to
the plenary talk series
(http://pycon.blogspot.com/2011/09/announcing-first-pycon-2012-plenary.html).
Sponsors can now list their job openings on the ?Job Fair? section of
the PyCon site (http://pycon.blogspot.com/2011/09/announcing-pycon-2012-fair-page-sponsor.html).
We?re hard at work to bring you the best conference yet, so stay tuned
to PyCon news at http://pycon.blogspot.com/ and on Twitter at
https://twitter.com/#!/pycon.

We recently eclipsed last year?s sponsorship count of 40 and are
currently at a record 52 organizations supporting PyCon. If you or
your organization are interested in sponsoring PyCon, we?d love to
hear from you, so check out our sponsorship page
(http://us.pycon.org/2012/sponsors/).

A quick thanks to all of our awesome PyCon 2012 Sponsors:

- Diamond Level: Google and Dropbox.
- Platinum Level: New Relic, SurveyMonkey, Microsoft, Eventbrite,
Nasuni and Gondor.io
- Gold Level: Walt Disney Animation Studios, CCP Games, Linode,
Enthought, Canonical, Dotcloud, Loggly, Revsys, ZeOmega, Bitly,
ActiveState, JetBrains, Caktus, Disqus, Spotify, Snoball, Evite, and
PlaidCloud
- Silver Level: Imaginary Landscape, WiserTogether, Net-ng, Olark, AG
Interactive, Bitbucket, Open Bastion, 10Gen, gocept, Lex Machina,
fwix, github, toast driven, Aarki, Threadless, Cox Media, myYearBook,
Accense Technology, Wingware, FreshBooks, and BigDoor
- Lanyard: Dreamhost
- Sprints: Reddit
- FLOSS: OSU/OSL, OpenHatch


The PyCon Organizers - http://us.pycon.org/2012
Jesse Noller - Chairman - jnoller at python.org
Brian Curtin - Publicity Coordinator - brian at python.org

From guido at python.org  Wed Sep 28 01:17:15 2011
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Sep 2011 16:17:15 -0700
Subject: [Python-Dev] PEP 393 memory savings update
In-Reply-To: <4E8254BA.6010705@v.loewis.de>
References: <4E8254BA.6010705@v.loewis.de>
Message-ID: <CAP7+vJK-df=2Dt6US6mZ_VQQWOk+5tTj0Y4u-OQc8LvZiAa8hA@mail.gmail.com>

Great news, Martin!

On Tue, Sep 27, 2011 at 3:56 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> I have redone my memory benchmark, and added a few new
> counters.
>
> The application is a very small Django application. The same
> source code of the app and Django itself is used on all Python
> versions. The full list of results is at
>
> http://www.dcl.hpi.uni-potsdam.de/home/loewis/djmemprof/
>
> Here are some excerpts:
>
> A. 32-bit builds, storage for Unicode objects
> 3.x, 32-bit wchar_t: 6378540
> 3.x, 16-bit wchar_t: 3694694
> PEP 393: ? ? ? ? ? ? 2216807
>
> Compared to the previous results, there are now some
> significant savings even compared to a narrow unicode build.
>
> B. 3.x, number of strings by maxchar:
> ASCII: ? 35713 (1,300,000 chars)
> Latin-1: 235 ? (11,000 chars)
> BMP: ? ? 260 ? (700 chars)
> other: ? 0
> total: ? 36,000 (1,310,000 chars)
>
> This explains why the savings for shortening ASCII objects
> are significant in this application. I have no good intuition
> how this effect would show for "real" applications. It may be
> that the percentage of ASCII strings (in number and chars) grows
> proportionally with the total number of strings; it may also
> be that the majority of these strings is a certain fixed overhead
> (resulting from Python identifiers and other interned strings).
>
> C. String-ish objects in 2.7 and 3.3-trunk:
> ? ? ? ? ? ? ? ? ? 2.x ? ? ? ? 3.x
> #unicode ? ? ? ? ? 370 ? ? ?36,000
> #bytes ? ? ? ? ?43,000 ? ? ?14,000
> #total ? ? ? ? ?43,400 ? ? ?50,000
>
> len(unicode) ? ? 5,300 ? 1,306,000
> len(bytes) ? 2,040,000 ? ? 860,000
> len(total) ? 2,046,000 ? 2,200,000
>
> (Note: the computations in the results are slightly messed up:
> the number of bytes for bytes objectts is actually the sum
> of the lengths, not the sum of the sizeofs; this gets added
> in the "total" lines to the sum of sizeofs of unicode strings,
> which is non-sensical. The table above corrects this)
>
> As you can see, Python 3 creates more string objects in total.
>
> D. Memory consumption for 2.x, 3.x, PEP 393, accounting both
> ? unicode and bytes objects, using 32-bit builds and 32-bit
> ? wchar_t:
> 2.x: ? ? 3,620,000 bytes
> 3.x: ? ? 7,750,000 bytes
> PEP 393: 3,340,000 bytes
>
> This suggests that PEP 393 actually reduces memory consumption
> below what 2.7 uses. This is offset though by "other" (non-string)
> objects, which take 300KB more in 3.x.
>
> Regards,
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (python.org/~guido)

From steve at pearwood.info  Wed Sep 28 01:43:13 2011
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 28 Sep 2011 09:43:13 +1000
Subject: [Python-Dev] unittest missing assertNotRaises
In-Reply-To: <20110927190532.GA32171@iskra.aviel.ru>
References: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
	<20110927190532.GA32171@iskra.aviel.ru>
Message-ID: <4E825F91.9000701@pearwood.info>

Oleg Broytman wrote:
> On Tue, Sep 27, 2011 at 07:46:52PM +0100, Wilfred Hughes wrote:
>> +    def assertNotRaises(self, excClass, callableObj=None, *args, **kwargs):
>> +        """Fail if an exception of class excClass is thrown by
>> +        callableObj when invoked with arguments args and keyword
>> +        arguments kwargs.
>> +        
>> +        """
>> +        try:
>> +            callableObj(*args, **kwargs)
>> +        except excClass:
>> +            raise self.failureException("%s was raised" % excClass)
>> +            
>> +
> 
>    What if I want to assert my test raises neither OSError nor IOError?

Passing (OSError, IOError) as excClass should do it.


But I can't see this being a useful test. As written, exceptions are 
still treated as errors, except for excClass, which is treated as a test 
failure. I can't see the use-case for that. assertRaises is useful:

"IOError is allowed, but any other exception is a bug."

makes perfect sense. assertNotRaises doesn't seem sensible or useful to me:

"IOError is a failed test, but any other exception is a bug."

What's the point? When would you use that?


-- 
Steven

From ckaynor at zindagigames.com  Wed Sep 28 01:58:47 2011
From: ckaynor at zindagigames.com (Chris Kaynor)
Date: Tue, 27 Sep 2011 16:58:47 -0700
Subject: [Python-Dev] unittest missing assertNotRaises
In-Reply-To: <4E825F91.9000701@pearwood.info>
References: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
	<20110927190532.GA32171@iskra.aviel.ru>
	<4E825F91.9000701@pearwood.info>
Message-ID: <CALvWhxu0bUU1RDBJyUierq97kLNG4QKDNSPGB+dQ7itk62CN9Q@mail.gmail.com>

On Tue, Sep 27, 2011 at 4:43 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> But I can't see this being a useful test. As written, exceptions are still treated as errors, except for excClass, which is treated as a test failure. I can't see the use-case for that. assertRaises is useful:
>
> "IOError is allowed, but any other exception is a bug."
>
> makes perfect sense. assertNotRaises doesn't seem sensible or useful to me:
>
> "IOError is a failed test, but any other exception is a bug."
>
> What's the point? When would you use that?
>

I've run across a few cases where this is the correct behavior. The
most recent one that comes to mind is while testing some code which
has specific silencing options: specifically, writing a main file and
a backup file, where failure to write the backup is not an error, but
failure to write the main is. As such, the test suite should have the
following tests:
- Failure to write the main should assert that the code raises the
failure error. No error is a failure, any other error is an error,
that error is a success. (it may also check that the backup was
written)
- Failure to write the backup should assert that the code does not
raise the failure error. No error is a success, that error is a
failure, any other error is a error. (it may also check that the main
was written)
- Both succeeding should assert that the files were actually written,
and that no error was raised. Any other result is an error.

Now, the difference between a Failure and an Error is more or less a
mute point, however I would expect an Error to be any unexpected
result, while a Failure is a predicted (either via forethought or
prior tests) but incorrect result.

From raymond.hettinger at gmail.com  Wed Sep 28 02:28:49 2011
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Tue, 27 Sep 2011 20:28:49 -0400
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E82226C.7080401@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
	<CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
	<4E82226C.7080401@stoneleaf.us>
Message-ID: <7E063D58-4591-42F3-A6B6-B977101D8241@gmail.com>


On Sep 27, 2011, at 3:22 PM, Ethan Furman wrote:

> Well, actually, I'd be using it with dates.  ;)

FWIW, an approach using itertools is pretty general but even it doesn't work for dates :-)

>>> from itertools import count, takewhile
>>> from decimal import Decimal
>>> from fractions import Fraction

>>> list(takewhile(lambda x: x<=10.0, count(0.0, 0.5)))
[0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0]

>>> list(takewhile(lambda x: x<=Decimal(1), count(Decimal(0), Decimal('0.1'))))
[Decimal('0'), Decimal('0.1'), Decimal('0.2'), Decimal('0.3'), Decimal('0.4'), Decimal('0.5'), Decimal('0.6'), Decimal('0.7'), Decimal('0.8'), Decimal('0.9'), Decimal('1.0')]

>>> list(takewhile(lambda x: x<=Fraction(2), count(Fraction(0), Fraction(1,3))))
[Fraction(0, 1), Fraction(1, 3), Fraction(2, 3), Fraction(1, 1), Fraction(4, 3), Fraction(5, 3), Fraction(2, 1)]

>>> from datetime import date, timedelta
>>> list(takewhile(lambda x: x<=date(2011,12,31), count(date(2011,9,27), timedelta(days=7))))

Traceback (most recent call last):
  File "<pyshell#17>", line 1, in <module>
    list(takewhile(lambda x: x<=date(2011,12,31), count(date(2011,9,27), timedelta(days=7))))
TypeError: a number is required

Raymond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110927/b2e63ea8/attachment.html>

From exarkun at twistedmatrix.com  Wed Sep 28 02:36:10 2011
From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com)
Date: Wed, 28 Sep 2011 00:36:10 -0000
Subject: [Python-Dev] unittest missing assertNotRaises
In-Reply-To: <CALvWhxu0bUU1RDBJyUierq97kLNG4QKDNSPGB+dQ7itk62CN9Q@mail.gmail.com>
References: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
	<20110927190532.GA32171@iskra.aviel.ru>
	<4E825F91.9000701@pearwood.info>
	<CALvWhxu0bUU1RDBJyUierq97kLNG4QKDNSPGB+dQ7itk62CN9Q@mail.gmail.com>
Message-ID: <20110928003610.2214.1934279766.divmod.xquotient.140@localhost.localdomain>


On 27 Sep, 11:58 pm, ckaynor at zindagigames.com wrote:
>On Tue, Sep 27, 2011 at 4:43 PM, Steven D'Aprano <steve at pearwood.info> 
>wrote:
>>But I can't see this being a useful test. As written, exceptions are 
>>still treated as errors, except for excClass, which is treated as a 
>>test failure. I can't see the use-case for that. assertRaises is 
>>useful:
>>
>>"IOError is allowed, but any other exception is a bug."
>>
>>makes perfect sense. assertNotRaises doesn't seem sensible or useful 
>>to me:
>>
>>"IOError is a failed test, but any other exception is a bug."
>>
>>What's the point? When would you use that?
>
>I've run across a few cases where this is the correct behavior. The
>most recent one that comes to mind is while testing some code which
>has specific silencing options: specifically, writing a main file and
>a backup file, where failure to write the backup is not an error, but
>failure to write the main is. As such, the test suite should have the
>following tests:
>- Failure to write the main should assert that the code raises the
>failure error. No error is a failure, any other error is an error,
>that error is a success. (it may also check that the backup was
>written)

This is assertRaises, not assertNotRaises.
>- Failure to write the backup should assert that the code does not
>raise the failure error. No error is a success, that error is a
>failure, any other error is a error. (it may also check that the main
>was written)

This is calling the function and asserting something about the result.
>- Both succeeding should assert that the files were actually written,
>and that no error was raised. Any other result is an error.
>
>Now, the difference between a Failure and an Error is more or less a
>mute point, however I would expect an Error to be any unexpected
>result, while a Failure is a predicted (either via forethought or
>prior tests) but incorrect result.

assertNotRaises doesn't make anything possible that isn't possible now. 
It probably doesn't even make anything easier - but if it does, it's so 
obscure (and I've read and written thousands of tests for all kinds of 
libraries over the years) that it doesn't merit a dedicated helper in 
the unittest library.

Jean-Paul

From turnbull at sk.tsukuba.ac.jp  Wed Sep 28 04:11:51 2011
From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull)
Date: Wed, 28 Sep 2011 11:11:51 +0900
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E82226C.7080401@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
	<CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
	<4E82226C.7080401@stoneleaf.us>
Message-ID: <87zkhph0uw.fsf@uwakimon.sk.tsukuba.ac.jp>

Ethan Furman writes:

 > Well, actually, I'd be using it with dates.  ;)

Why are you representing dates with floats?

(That's a rhetorical question, don't answer it.)

This is the whole problem with this discussion.  Guido is saying (and
I think it's plausible though I don't have enough experience to  be
sure myself) that if you look at the various use cases for such
functions, they're different enough that it's going to be hard to come
up with a single API that is good, let alone optimal, for them all.
Then people keep coming back with "but look at X, where this API is
clearly very useful", for values of X restricted to "stuff they do".

That's good module design; it's not a good idea for the language
(including builtins).  Remember, something like range (Python 3) or
range (Python 2) was *really necessary*[1] to express in Python the
same algorithm that the C construct 'for' does.  I agree with Steven
d' that count would have been a somewhat better name (at least in my
dialect it is possible, though somewhat unusual, to say "count up from
10 to 20 by 3s"), but that doesn't become clear until you want to talk
about polymorphic versions of the concept.  Also, in statistics
"range" refers to a much smaller set (ie, {min, max}) than it does in
Python, not that I really care.<wink/>

As far as a name for a more general concept, perhaps "interval" would
be an interesting choice (although in analysis it has a connotation of
continuity that would be inappropriate for a discrete set of floats).

Footnotes: 
[1]  FSVO "necessary" that includes "let's not do arithmetic on the
index variable inside the loop".


From ubershmekel at gmail.com  Wed Sep 28 07:06:36 2011
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Wed, 28 Sep 2011 01:06:36 -0400
Subject: [Python-Dev] unittest missing assertNotRaises
In-Reply-To: <20110928003610.2214.1934279766.divmod.xquotient.140@localhost.localdomain>
References: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
	<20110927190532.GA32171@iskra.aviel.ru>
	<4E825F91.9000701@pearwood.info>
	<CALvWhxu0bUU1RDBJyUierq97kLNG4QKDNSPGB+dQ7itk62CN9Q@mail.gmail.com>
	<20110928003610.2214.1934279766.divmod.xquotient.140@localhost.localdomain>
Message-ID: <CANSw7KwzeLTL+drGUJdEKhVOCD3ZQ3P+27XGQSYSp_7JgW1iZw@mail.gmail.com>

On Sep 27, 2011 5:56 PM, <exarkun at twistedmatrix.com> wrote:
>
>
> assertNotRaises doesn't make anything possible that isn't possible now. It
probably doesn't even make anything easier - but if it does, it's so obscure
(and I've read and written thousands of tests for all kinds of libraries
over the years) that it doesn't merit a dedicated helper in the unittest
library.
>
> Jean-Paul
>

+1 for keeping it simple. TOOWTDI.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110928/a5a8113f/attachment.html>

From g.brandl at gmx.net  Wed Sep 28 08:51:52 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 28 Sep 2011 08:51:52 +0200
Subject: [Python-Dev] cpython: Implement PEP 393.
In-Reply-To: <E1R8nkN-0007Z5-Nx@dinsdale.python.org>
References: <E1R8nkN-0007Z5-Nx@dinsdale.python.org>
Message-ID: <j5ug3h$t2o$1@dough.gmane.org>

Am 28.09.2011 08:35, schrieb martin.v.loewis:
> http://hg.python.org/cpython/rev/8beaa9a37387
> changeset:   72475:8beaa9a37387
> user:        Martin v. L?wis <martin at v.loewis.de>
> date:        Wed Sep 28 07:41:54 2011 +0200
> summary:
>   Implement PEP 393.
> 
[...]
> 
> diff --git a/Doc/c-api/unicode.rst b/Doc/c-api/unicode.rst
> --- a/Doc/c-api/unicode.rst
> +++ b/Doc/c-api/unicode.rst
> @@ -1072,6 +1072,15 @@
>     occurred and an exception has been set.
>  
>  
> +.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, Py_ssize_t start, Py_ssize_t end, int direction)
> +
> +   Return the first position of the character *ch* in ``str[start:end]`` using
> +   the given *direction* (*direction* == 1 means to do a forward search,
> +   *direction* == -1 a backward search).  The return value is the index of the
> +   first match; a value of ``-1`` indicates that no match was found, and ``-2``
> +   indicates that an error occurred and an exception has been set.
> +
> +
>  .. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end)
>  
>     Return the number of non-overlapping occurrences of *substr* in

This is the only doc change for this change (and it doesn't have a versionadded).

Surely there must be more new APIs and changes that need documenting?

Georg


From martin at v.loewis.de  Wed Sep 28 09:46:34 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 28 Sep 2011 09:46:34 +0200
Subject: [Python-Dev] cpython: Implement PEP 393.
In-Reply-To: <j5ug3h$t2o$1@dough.gmane.org>
References: <E1R8nkN-0007Z5-Nx@dinsdale.python.org>
	<j5ug3h$t2o$1@dough.gmane.org>
Message-ID: <4E82D0DA.50405@v.loewis.de>

> Surely there must be more new APIs and changes that need documenting?

Correct. All documentation still needs to be written.

Regards,
Martin

From martin at v.loewis.de  Wed Sep 28 09:48:32 2011
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 28 Sep 2011 09:48:32 +0200
Subject: [Python-Dev] PEP 393 merged
Message-ID: <4E82D150.7050204@v.loewis.de>

I have now merged the PEP 393 implementation into default.
The main missing piece is the documentation; contributions are
welcome.

Regards,
Martin

From wilfred at potatolondon.com  Wed Sep 28 12:20:54 2011
From: wilfred at potatolondon.com (Wilfred Hughes)
Date: Wed, 28 Sep 2011 11:20:54 +0100
Subject: [Python-Dev] unittest missing assertNotRaises
In-Reply-To: <CAE_Hg6a_S1aFmKM-qX9y5tPEW9rLZSsCz8PQzuf4RCxn9MN6LA@mail.gmail.com>
References: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
	<CAE_Hg6a_S1aFmKM-qX9y5tPEW9rLZSsCz8PQzuf4RCxn9MN6LA@mail.gmail.com>
Message-ID: <CAMJPRQwK56LEXyV0BcJOMwgHd7-=e46kFg2=JJWirCKai6wLWw@mail.gmail.com>

On 27 September 2011 19:59, Laurens Van Houtven <_ at lvh.cc> wrote:
> Sure, you just *do* it. The only advantage I see in assertNotRaises is that when that exception is raised, you should (and would) get a failure, not an error.

It's a useful distinction.?I have found myself writing code of the form:

def test_old_exception_no_longer_raised(self):
    try:
        do_something():
    except OldException:
        self.assertTrue(False)

in order to distinguish between a regression and something new
erroring. The limitation of this pattern is that the test failure
message is not as good.

From phd at phdru.name  Wed Sep 28 12:51:00 2011
From: phd at phdru.name (Oleg Broytman)
Date: Wed, 28 Sep 2011 14:51:00 +0400
Subject: [Python-Dev] unittest missing assertNotRaises
In-Reply-To: <4E825F91.9000701@pearwood.info>
References: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
	<20110927190532.GA32171@iskra.aviel.ru>
	<4E825F91.9000701@pearwood.info>
Message-ID: <20110928105100.GB22828@iskra.aviel.ru>

On Wed, Sep 28, 2011 at 09:43:13AM +1000, Steven D'Aprano wrote:
> Oleg Broytman wrote:
> >On Tue, Sep 27, 2011 at 07:46:52PM +0100, Wilfred Hughes wrote:
> >>+    def assertNotRaises(self, excClass, callableObj=None, *args, **kwargs):
> >>+        """Fail if an exception of class excClass is thrown by
> >>+        callableObj when invoked with arguments args and keyword
> >>+        arguments kwargs.
> >>+        +        """
> >>+        try:
> >>+            callableObj(*args, **kwargs)
> >>+        except excClass:
> >>+            raise self.failureException("%s was raised" % excClass)
> >>+            +

> But I can't see this being a useful test.

   Me too.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From fuzzyman at voidspace.org.uk  Wed Sep 28 13:04:06 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Wed, 28 Sep 2011 12:04:06 +0100
Subject: [Python-Dev] unittest missing assertNotRaises
In-Reply-To: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
References: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
Message-ID: <4E82FF26.6060002@voidspace.org.uk>

On 27/09/2011 19:46, Wilfred Hughes wrote:
> Hi folks
>
> I wasn't sure if this warranted a bug in the tracker, so I thought I'd 
> raise it here first.
>
> unittest has assertIn, assertNotIn, assertEqual, assertNotEqual and so 
> on. So, it seems odd to me that there isn't assertNotRaises. Is there 
> any particular motivation for not putting it in?
>
> I've attached a simple patch against Python 3's trunk to give an idea 
> of what I have in mind.
>

As others have said, the opposite of assertRaises is just calling the code!

I have several times needed regression tests that call code that *used* 
to raise an exception. It can look slightly odd to have a test without 
an assert, but the singular uselessness of assertNotRaises does not make 
it a better alternative. I usually add a comment:

def test_something_that_used_to_not_work(self):
     # this used to raise an exception
     do_something()

All the best,

Michael Foord

> Thanks
> Wilfred
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk


-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110928/0c1198b8/attachment.html>

From fuzzyman at voidspace.org.uk  Wed Sep 28 13:05:13 2011
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Wed, 28 Sep 2011 12:05:13 +0100
Subject: [Python-Dev] unittest missing assertNotRaises
In-Reply-To: <CAE_Hg6a_S1aFmKM-qX9y5tPEW9rLZSsCz8PQzuf4RCxn9MN6LA@mail.gmail.com>
References: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
	<CAE_Hg6a_S1aFmKM-qX9y5tPEW9rLZSsCz8PQzuf4RCxn9MN6LA@mail.gmail.com>
Message-ID: <4E82FF69.5010208@voidspace.org.uk>

On 27/09/2011 19:59, Laurens Van Houtven wrote:
> Sure, you just *do* it. The only advantage I see in assertNotRaises is 
> that when that exception is raised, you should (and would) get a 
> failure, not an error.
There are some who don't see the distinction between a failure and an 
error as a useful distinction... I'm becoming more sympathetic to that view.

All the best,

Michael

>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk


-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110928/73768f97/attachment.html>

From greg.ewing at canterbury.ac.nz  Wed Sep 28 13:06:34 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 29 Sep 2011 00:06:34 +1300
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <4E82226C.7080401@stoneleaf.us>
References: <4E7CCA42.2060100@stoneleaf.us> <j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<CAP7+vJL+OWCRMzLt8GeDhz_ksyLzqq7AiPM_zjezBkRy7XU3iw@mail.gmail.com>
	<4E816BED.4000103@canterbury.ac.nz>
	<CAP7h-xap-BuhOmvNQ7KSFD5RDsS3mw6+Dxuy2G+Q4VMc73a5=A@mail.gmail.com>
	<4E81EAA0.5080507@stoneleaf.us>
	<A9C71BB1-B98D-4075-BE8A-1B6A77C6C3A9@gmail.com>
	<CAP7h-xbvknzkOyFWCq2vTzfU4K0KZFPGgML_1V0LhXQ6Y=H6qQ@mail.gmail.com>
	<CAP7+vJ+7h6pdbsnRxKfY2fXtqPiPT+jSrs9C3SNkZEz=N1pmmw@mail.gmail.com>
	<CAP7h-xbvxD9c=ZgZp-nLP0VbRB7_QekfJYCuC5O9nLcNtOwHsg@mail.gmail.com>
	<CAP7+vJ+v6bDTyUbo+oiwEXFMGdsFUWigWhkO-+U6+08zd8fs=Q@mail.gmail.com>
	<4E82089D.8070209@stoneleaf.us> <4E820E03.6090100@pearwood.info>
	<4E8213E9.1060705@stoneleaf.us>
	<CAP7+vJKJNFxcKr-zGK95QhFFNK7L+o0A97UD6piMOjWUfa7hjg@mail.gmail.com>
	<4E82226C.7080401@stoneleaf.us>
Message-ID: <4E82FFBA.209@canterbury.ac.nz>

Ethan Furman wrote:

> Well, actually, I'd be using it with dates.  ;)

Seems to me that one size isn't going to fit all.

Maybe we really want two functions:

    interpolate(start, end, count)
        Requires a type supporting addition and division,
        designed to work predictably and accurately with
        floats

    extrapolate(start, step, end)
        Works for any type supporting addition, not
        recommended for floats

-- 
Greg

From martin at v.loewis.de  Wed Sep 28 13:24:22 2011
From: martin at v.loewis.de (martin at v.loewis.de)
Date: Wed, 28 Sep 2011 13:24:22 +0200
Subject: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393
Message-ID: <20110928132422.Horde.OvQBCtjz9kROgwPm5ZwktiA@webmail.df.eu>

The gcc that Apple ships with the Lion SDK (not sure what Xcode  
version that is)
miscompiles Python now. I've reported this to Apple as bug 10143715;  
not sure whether
there is a public link to this bug report.

In essence, the code

typedef struct {
     long length;
     long hash;
     int state;
     int *wstr;
} PyASCIIObject;

typedef struct {
     PyASCIIObject _base;
     long utf8_length;

     char *utf8;
     long wstr_length;

} PyCompactUnicodeObject;

void *_PyUnicode_compact_data(void *unicode) {
     return ((((PyASCIIObject*)unicode)->state & 0x20) ?
             ((void*)((PyASCIIObject*)(unicode) + 1)) :
             ((void*)((PyCompactUnicodeObject*)(unicode) + 1)));
}

miscompiles (with -O2 -fomit-frame-pointer) to


__PyUnicode_compact_data:
Leh_func_begin1:
         leaq    32(%rdi), %rax
         ret

The compiler version is

gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)

This unconditionally assumes that sizeof(PyASCIIObject) needs to be
added to unicode, independent of whether the state bit is set or not.

I'm not aware of a work-around in the code. My work-around is to use gcc-4.0,
which is still available on my system from an earlier Xcode installation
(in /Developer-3.2.6)

Regards,
Martin


From catch-all at masklinn.net  Wed Sep 28 13:45:14 2011
From: catch-all at masklinn.net (Xavier Morel)
Date: Wed, 28 Sep 2011 13:45:14 +0200
Subject: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393
In-Reply-To: <20110928132422.Horde.OvQBCtjz9kROgwPm5ZwktiA@webmail.df.eu>
References: <20110928132422.Horde.OvQBCtjz9kROgwPm5ZwktiA@webmail.df.eu>
Message-ID: <FB44652B-5C10-4A32-9AD9-37826D81E9EF@masklinn.net>

On 2011-09-28, at 13:24 , martin at v.loewis.de wrote:
> The gcc that Apple ships with the Lion SDK (not sure what Xcode version that is)
Xcode 4.1

> I'm not aware of a work-around in the code. My work-around is to use gcc-4.0,
> which is still available on my system from an earlier Xcode installation
> (in /Developer-3.2.6)
Does Clang also fail to compile this? Clang was updated from 1.6 to 2.0 with Xcode 4, worth a try.

Also, from your version listing it seems to be llvm-gcc (gcc frontend with llvm backend I think), is there no more straight gcc (with gcc frontend and backend)?

FWIW, on 10.6 the default gcc is a straight 4.2

    > gcc --version                                
    i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664)

There is an llvm-gcc 4.2 but it uses a slightly different revision of llvm

    > llvm-gcc --version                                                           
    i686-apple-darwin10-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2333.4)


From _ at lvh.cc  Wed Sep 28 16:59:12 2011
From: _ at lvh.cc (Laurens Van Houtven)
Date: Wed, 28 Sep 2011 16:59:12 +0200
Subject: [Python-Dev] unittest missing assertNotRaises
In-Reply-To: <4E82FF69.5010208@voidspace.org.uk>
References: <CAMJPRQy2oZc=F6aEeVJh8quMBWAjDtFH_HqEw67tqHZeo7KwmQ@mail.gmail.com>
	<CAE_Hg6a_S1aFmKM-qX9y5tPEW9rLZSsCz8PQzuf4RCxn9MN6LA@mail.gmail.com>
	<4E82FF69.5010208@voidspace.org.uk>
Message-ID: <CAE_Hg6aBPjTn+tAZYDxogJ1wXLybU1Uj1O7t-WEk7P_7qLEObA@mail.gmail.com>

Oops, I accidentally hit Reply instead of Reply to All...

On Wed, Sep 28, 2011 at 1:05 PM, Michael Foord <fuzzyman at voidspace.org.uk>wrote:

>  On 27/09/2011 19:59, Laurens Van Houtven wrote:
>
> Sure, you just *do* it. The only advantage I see in assertNotRaises is that
> when that exception is raised, you should (and would) get a failure, not an
> error.
>
> There are some who don't see the distinction between a failure and an error
> as a useful distinction... I'm becoming more sympathetic to that view.
>

I agree. Maybe if there were less failures posing as errors and errors
posing as failures, I'd consider taking the distinction seriously.

The only use case I've personally encountered is with fuzzy tests. The
example that comes to mind is one where we had a fairly complex iterative
algorithm for learning things from huge amounts of test data and there were
certain criteria (goodness of result, time taken) that had to be satisfied.
In that case, "it blew up because someone messed up dependencies" and "it
took 3% longer than is allowable"  are pretty obviously different...
Considering how exotic that use case is, like I said, I'm not really
convinced how generally useful it is :) especially since this isn't even a
unit test...


> All the best,
>
> Michael
>

cheers
lvh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110928/7893d845/attachment.html>

From guido at python.org  Wed Sep 28 17:41:48 2011
From: guido at python.org (Guido van Rossum)
Date: Wed, 28 Sep 2011 08:41:48 -0700
Subject: [Python-Dev] PEP 393 merged
In-Reply-To: <4E82D150.7050204@v.loewis.de>
References: <4E82D150.7050204@v.loewis.de>
Message-ID: <CAP7+vJJHk11YdpN_qsRq1zbSqyjSy_3QHEFf74ZjwkCdNJr_gw@mail.gmail.com>

Congrats! Python 3.3 will be better because of this.

On Wed, Sep 28, 2011 at 12:48 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> I have now merged the PEP 393 implementation into default.
> The main missing piece is the documentation; contributions are
> welcome.

-- 
--Guido van Rossum (python.org/~guido)

From mal at egenix.com  Wed Sep 28 18:44:23 2011
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 28 Sep 2011 18:44:23 +0200
Subject: [Python-Dev] PEP 393 close to pronouncement
In-Reply-To: <CAP7+vJLsogEO5ERjUmPWbj1GewKFrcEh-v5DDLqgX+oJYvJFLA@mail.gmail.com>
References: <CAP7+vJLDK10VfQ+KRXDAMrNaRV+-1W76SdBGQNd9GHAbiuPOwA@mail.gmail.com>	<201109270019.02442.victor.stinner@haypocalc.com>	<201109271550.27837.victor.stinner@haypocalc.com>
	<CAP7+vJLsogEO5ERjUmPWbj1GewKFrcEh-v5DDLqgX+oJYvJFLA@mail.gmail.com>
Message-ID: <4E834EE7.4050706@egenix.com>

Guido van Rossum wrote:
> Given the feedback so far, I am happy to pronounce PEP 393 as
> accepted. Martin, congratulations! Go ahead and mark ity as Accepted.
> (But please do fix up the small nits that Victor reported in his
> earlier message.)

I've been working on feedback for the last few days, but I guess it's
too late. Here goes anyway...

I've only read the PEP and not followed the discussion due to lack of
time, so if any of this is no longer valid, that's probably because
the PEP wasn't updated :-)

Resizing
--------

Codecs use resizing a lot. Given that PyCompactUnicodeObject
does not support resizing, most decoders will have to use
PyUnicodeObject and thus not benefit from the memory footprint
advantages of e.g. PyASCIIObject.


Data structure
--------------

The data structure description in the PEP appears to be wrong:

PyASCIIObject has a wchar_t *wstr pointer - I guess this should
be a char *str pointer, otherwise, where's the memory footprint
advantage (esp. on Linux where sizeof(wchar_t) == 4) ?

I also don't see a reason to limit the UCS1 storage version
to ASCII. Accordingly, the object should be called PyLatin1Object
or PyUCS1Object.

Here's the version from the PEP:

"""
typedef struct {
  PyObject_HEAD
  Py_ssize_t length;
  Py_hash_t hash;
  struct {
      unsigned int interned:2;
      unsigned int kind:2;
      unsigned int compact:1;
      unsigned int ascii:1;
      unsigned int ready:1;
  } state;
  wchar_t *wstr;
} PyASCIIObject;

typedef struct {
  PyASCIIObject _base;
  Py_ssize_t utf8_length;
  char *utf8;
  Py_ssize_t wstr_length;
} PyCompactUnicodeObject;
"""

Typedef'ing Py_UNICODE to wchar_t and using wchar_t in existing
code will cause problems on some systems where whcar_t is a
signed type.

Python assumes that Py_UNICODE is unsigned and thus doesn't
check for negative values or takes these into account when
doing range checks or code point arithmetic.

On such platform where wchar_t is signed, it is safer to
typedef Py_UNICODE to unsigned wchar_t.

Accordingly and to prevent further breakage, Py_UNICODE
should not be deprecated and used instead of wchar_t
throughout the code.


Length information
------------------

Py_UNICODE access to the objects assumes that len(obj) ==
length of the Py_UNICODE buffer. The PEP suggests that length
should not take surrogates into account on UCS2 platforms
such as Windows. The causes len(obj) to not match len(wstr).

As a result, Py_UNICODE access to the Unicode objects breaks
when surrogate code points are present in the Unicode object
on UCS2 platforms.

The PEP also does not explain how lone surrogates will be
handled with respect to the length information.

Furthermore, determining len(obj) will require a loop over
the data, checking for surrogate code points. A simple memcpy()
is no longer enough.

I suggest to drop the idea of having len(obj) not count
wstr surrogate code points to maintain backwards compatibility
and allow for working with lone surrogates.

Note that the whole surrogate debate does not have much to
do with this PEP, since it's mainly about memory footprint
savings. I'd also urge to do a reality check with respect
to surrogates and non-BMP code points: in practice you only
very rarely see any non-BMP code points in your data. Making
all Python users pay for the needs of a tiny fraction is
not really fair. Remember: practicality beats purity.


API
---

Victor already described the needed changes.


Performance
-----------

The PEP only lists a few low-level benchmarks as basis for the
performance decrease. I'm missing some more adequate real-life
tests, e.g. using an application framework such as Django
(to the extent this is possible with Python3) or a server
like the Radicale calendar server (which is available for Python3).

I'd also like to see a performance comparison which specifically
uses the existing Unicode APIs to create and work with Unicode
objects. Most extensions will use this way of working with the
Unicode API, either because they want to support Python 2 and 3,
or because the effort it takes to port to the new APIs is
too high. The PEP makes some statements that this is slower,
but doesn't quantify those statements.


Memory savings
--------------

The table only lists string sizes up 8 code points. The memory
savings for these are really only significant for ASCII
strings on 64-bit platforms, if you use the default UCS2
Python build as basis.

For larger strings, I expect the savings to be more significant.
OTOH, a single non-BMP code point in such a string would cause
the savings to drop significantly again.


Complexity
----------

In order to benefit from the new API, any code that has to
deal with low-level Py_UNICODE access to the Unicode objects
will have to be adapted.

For best performance, each algorithm will have to be implemented
for all three storage types.

Not doing so, will result in a slow-down, if I read the PEP
correctly. It's difficult to say, of what scale, since that
information is not given in the PEP, but the added loop over
the complete data array in order to determine the maximum
code point value suggests that it is significant.


Summary
-------

I am not convinced that the memory savings are big enough
to warrant the performance penalty and added complexity
suggested by the PEP.

In times where even smartphones come with multiple GB of RAM,
performance is more important than memory savings.

In practice, using a UCS2 build of Python usually is a good
compromise between memory savings, performance and standards
compatibility. For the few cases where you have to deal with
UCS4 code points, we already have made good progress to
make handling these much easier. IMHO, Python should be
optimized for UCS2 usage, not the rare cases of UCS4 usage
you find in practice.

I do see the advantage for large strings, though.


My personal conclusion
----------------------

Given that I've been working on and maintaining the Python Unicode
implementation actively or by providing assistance for almost
12 years now, I've also thought about whether it's still worth
the effort.

My interests have shifted somewhat into other directions and
I feel that helping Python reach world domination in other ways
makes me happier than fighting over Unicode standards, implementations,
special cases that aren't special enough, and all those other
nitty-gritty details that cause long discussions :-)

So I feel that the PEP 393 change is a good time to draw a line
and leave Unicode maintenance to Ezio, Victor, Martin, and
all the others that have helped over the years. I know it's
in good hands. So here it is:

----------------------------------------------------------------

Hey, that was easy :-)

PS: I'll stick around a bit more for the platform module, pybench
and whatever else comes along where you might be interested in my
input.

Thanks and cheers,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 28 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany                 6 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From benjamin at python.org  Wed Sep 28 19:15:24 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Wed, 28 Sep 2011 13:15:24 -0400
Subject: [Python-Dev] PEP 393 close to pronouncement
In-Reply-To: <4E834EE7.4050706@egenix.com>
References: <CAP7+vJLDK10VfQ+KRXDAMrNaRV+-1W76SdBGQNd9GHAbiuPOwA@mail.gmail.com>
	<201109270019.02442.victor.stinner@haypocalc.com>
	<201109271550.27837.victor.stinner@haypocalc.com>
	<CAP7+vJLsogEO5ERjUmPWbj1GewKFrcEh-v5DDLqgX+oJYvJFLA@mail.gmail.com>
	<4E834EE7.4050706@egenix.com>
Message-ID: <CAPZV6o_MGdmG7OavFqH2WaaMQunsUTMRROeHRneVV6ngVTzKcA@mail.gmail.com>

2011/9/28 M.-A. Lemburg <mal at egenix.com>:
> Guido van Rossum wrote:
>> Given the feedback so far, I am happy to pronounce PEP 393 as
>> accepted. Martin, congratulations! Go ahead and mark ity as Accepted.
>> (But please do fix up the small nits that Victor reported in his
>> earlier message.)
>
> I've been working on feedback for the last few days, but I guess it's
> too late. Here goes anyway...
>
> I've only read the PEP and not followed the discussion due to lack of
> time, so if any of this is no longer valid, that's probably because
> the PEP wasn't updated :-)
>
> Resizing
> --------
>
> Codecs use resizing a lot. Given that PyCompactUnicodeObject
> does not support resizing, most decoders will have to use
> PyUnicodeObject and thus not benefit from the memory footprint
> advantages of e.g. PyASCIIObject.
>
>
> Data structure
> --------------
>
> The data structure description in the PEP appears to be wrong:
>
> PyASCIIObject has a wchar_t *wstr pointer - I guess this should
> be a char *str pointer, otherwise, where's the memory footprint
> advantage (esp. on Linux where sizeof(wchar_t) == 4) ?
>
> I also don't see a reason to limit the UCS1 storage version
> to ASCII. Accordingly, the object should be called PyLatin1Object
> or PyUCS1Object.

I think the purpose is that if it's only ASCII, no work is need to
encode to UTF-8.


-- 
Regards,
Benjamin

From martin at v.loewis.de  Wed Sep 28 19:47:22 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 28 Sep 2011 19:47:22 +0200
Subject: [Python-Dev] PEP 393 close to pronouncement
In-Reply-To: <4E834EE7.4050706@egenix.com>
References: <CAP7+vJLDK10VfQ+KRXDAMrNaRV+-1W76SdBGQNd9GHAbiuPOwA@mail.gmail.com>	<201109270019.02442.victor.stinner@haypocalc.com>	<201109271550.27837.victor.stinner@haypocalc.com>	<CAP7+vJLsogEO5ERjUmPWbj1GewKFrcEh-v5DDLqgX+oJYvJFLA@mail.gmail.com>
	<4E834EE7.4050706@egenix.com>
Message-ID: <4E835DAA.8020308@v.loewis.de>

> Codecs use resizing a lot. Given that PyCompactUnicodeObject
> does not support resizing, most decoders will have to use
> PyUnicodeObject and thus not benefit from the memory footprint
> advantages of e.g. PyASCIIObject.

No, codecs have been rewritten to not use resizing.

> PyASCIIObject has a wchar_t *wstr pointer - I guess this should
> be a char *str pointer, otherwise, where's the memory footprint
> advantage (esp. on Linux where sizeof(wchar_t) == 4) ?

That's the Py_UNICODE representation for backwards compatibility.
It's normally NULL.

> I also don't see a reason to limit the UCS1 storage version
> to ASCII. Accordingly, the object should be called PyLatin1Object
> or PyUCS1Object.

No, in the ASCII case, the UTF-8 length can be shared with the regular
string length - not so for Latin-1 character above 127.

> Typedef'ing Py_UNICODE to wchar_t and using wchar_t in existing
> code will cause problems on some systems where whcar_t is a
> signed type.
> 
> Python assumes that Py_UNICODE is unsigned and thus doesn't
> check for negative values or takes these into account when
> doing range checks or code point arithmetic.
> 
> On such platform where wchar_t is signed, it is safer to
> typedef Py_UNICODE to unsigned wchar_t.

No. Py_UNICODE values *must* be in the range 0..17*2**16.
Values larger than 17*2**16 are just as bad as negative
values, so having Py_UNICODE unsigned doesn't improve
anything.

> Py_UNICODE access to the objects assumes that len(obj) ==
> length of the Py_UNICODE buffer. The PEP suggests that length
> should not take surrogates into account on UCS2 platforms
> such as Windows. The causes len(obj) to not match len(wstr).

Correct.

> As a result, Py_UNICODE access to the Unicode objects breaks
> when surrogate code points are present in the Unicode object
> on UCS2 platforms.

Incorrect. What specifically do you think would break?

> The PEP also does not explain how lone surrogates will be
> handled with respect to the length information.

Just as any other code point. Python does not special-case
surrogate code points anymore.

> Furthermore, determining len(obj) will require a loop over
> the data, checking for surrogate code points. A simple memcpy()
> is no longer enough.

No, it won't. The length of the Unicode object is stored in
the length field.

> I suggest to drop the idea of having len(obj) not count
> wstr surrogate code points to maintain backwards compatibility
> and allow for working with lone surrogates.

Backwards-compatibility is fully preserved by PyUnicode_GET_SIZE
returning the size of the Py_UNICODE buffer. PyUnicode_GET_LENGTH
returns the true length of the Unicode object.

> Note that the whole surrogate debate does not have much to
> do with this PEP, since it's mainly about memory footprint
> savings. I'd also urge to do a reality check with respect
> to surrogates and non-BMP code points: in practice you only
> very rarely see any non-BMP code points in your data. Making
> all Python users pay for the needs of a tiny fraction is
> not really fair. Remember: practicality beats purity.

That's the whole point of the PEP. You only pay for what
you actually need, and in most cases, it's ASCII.

> For best performance, each algorithm will have to be implemented
> for all three storage types.

This will be a trade-off. I think most developers will be happy
with a single version covering all three cases, especially as it's
much more maintainable.

Kind regards,
Martin


From martin at v.loewis.de  Wed Sep 28 19:49:16 2011
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 28 Sep 2011 19:49:16 +0200
Subject: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393
In-Reply-To: <FB44652B-5C10-4A32-9AD9-37826D81E9EF@masklinn.net>
References: <20110928132422.Horde.OvQBCtjz9kROgwPm5ZwktiA@webmail.df.eu>
	<FB44652B-5C10-4A32-9AD9-37826D81E9EF@masklinn.net>
Message-ID: <4E835E1C.8090700@v.loewis.de>

> Does Clang also fail to compile this? Clang was updated from 1.6 to 2.0 with Xcode 4, worth a try.

clang indeed works fine.

> Also, from your version listing it seems to be llvm-gcc (gcc frontend with llvm backend I think), 
> is there no more straight gcc (with gcc frontend and backend)?

/usr/bin/cc and /usr/bin/gcc both link to llvm-gcc-4.2. However, there
still is /usr/bin/gcc-4.2. Using that, Python also compiles correctly -
so I have changed the gcc link on my system.

Thanks for the advise - I didn't expect that Apple ships thhree compilers...

Regards,
Martin

From catch-all at masklinn.net  Wed Sep 28 19:56:45 2011
From: catch-all at masklinn.net (Xavier Morel)
Date: Wed, 28 Sep 2011 19:56:45 +0200
Subject: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393
In-Reply-To: <4E835E1C.8090700@v.loewis.de>
References: <20110928132422.Horde.OvQBCtjz9kROgwPm5ZwktiA@webmail.df.eu>
	<FB44652B-5C10-4A32-9AD9-37826D81E9EF@masklinn.net>
	<4E835E1C.8090700@v.loewis.de>
Message-ID: <74F6ADFA-874D-4BAC-B304-CE8B12D80126@masklinn.net>

On 2011-09-28, at 19:49 , Martin v. L?wis wrote:
> 
> Thanks for the advise - I didn't expect that Apple ships thhree compilers?
Yeah I can understand that, they're in the middle of the transition but Clang is not quite there yet so...

From yasar11732 at gmail.com  Wed Sep 28 21:00:50 2011
From: yasar11732 at gmail.com (=?ISO-8859-9?Q?Ya=FEar_Arabac=FD?=)
Date: Wed, 28 Sep 2011 22:00:50 +0300
Subject: [Python-Dev] What it takes to change a single keyword.
Message-ID: <CAFEUn8YMigNY9_9O88_D1OhZv0K=bpdV1qzoeNjt3=nva4Q7rg@mail.gmail.com>

Hi,

First of all, I am sincerely sorry if this is wrong mailing list to ask this
question. I checked out definitions of couple other mailing list, and this
one seemed most suitable. Here is my question:

Let's say I want to change a single keyword, let's say import keyword, to be
spelled as something else, like it's translation to my language. I guess it
would be more complicated than modifiying Grammar/Grammar, but I can't be
sure which files should get edited.

I'am asking this, because, I am trying to figure out if I could translate
keyword's into another language, without affecting behaviour of language.


-- 
http://yasar.serveblog.net/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110928/c86fe4e7/attachment.html>

From fperez.net at gmail.com  Wed Sep 28 22:55:27 2011
From: fperez.net at gmail.com (Fernando Perez)
Date: Wed, 28 Sep 2011 20:55:27 +0000 (UTC)
Subject: [Python-Dev] range objects in 3.x
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
Message-ID: <j601ju$lnb$1@dough.gmane.org>

On Tue, 27 Sep 2011 11:25:48 +1000, Steven D'Aprano wrote:

> The audience for numpy is a small minority of Python users, and they

Certainly, though I'd like to mention that scientific computing is a major 
success story for Python, so hopefully it's a minority with something to 
contribute <wink>

> tend to be more sophisticated. I'm sure they can cope with two functions
> with different APIs <wink>

No problem with having different APIs, but in that case I'd hope the 
builtin wouldnt' be named linspace, to avoid confusion.  In numpy/scipy we 
try hard to avoid collisions with existing builtin names, hopefully in 
this case we can prevent the reverse by having a dialogue.

> While continuity of API might be a good thing, we shouldn't accept a
> poor API just for the sake of continuity. I have some criticisms of the
> linspace API.
> 
> numpy.linspace(start, stop, num=50, endpoint=True, retstep=False)
> 
> http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html
> 
> * It returns a sequence, which is appropriate for numpy but in standard
> Python it should return an iterator or something like a range object.

Sure, no problem there.

> * Why does num have a default of 50? That seems to be an arbitrary
> choice.

Yup.  linspace was modeled after matlab's identically named command:

http://www.mathworks.com/help/techdoc/ref/linspace.html

but I have no idea why the author went with 50 instead of 100 as the 
default (not that 100 is any better, just that it was matlab's choice).  
Given how linspace is often used for plotting, 100 is arguably a more 
sensible choice to get reasonable graphs on normal-resolution displays at 
typical sizes, absent adaptive plotting algorithms.

> * It arbitrarily singles out the end point for special treatment. When
> integrating, it is just as common for the first point to be singular as
> the end point, and therefore needing to be excluded.

Numerical integration is *not* the focus of linspace(): in numerical 
integration, if an end point is singular you have an improper integral and 
*must* approach the singularity much more carefully than by simply 
dropping the last point and hoping for the best.  Whether you can get away 
by using (desired_end_point - very_small_number) --the dumb, naive 
approach-- or not depends a lot on the nature of the singularity.

Since numerical integration is a complex and specialized domain and the 
subject of an entire subcomponent of the (much bigger than numpy) scipy 
library, there's no point in arguing the linspace API based on numerical 
integration considerations.

Now, I *suspect* (but don't remember for sure) that the option to have it 
right-hand-open-ended was to match the mental model people have for range:

In [5]: linspace(0, 10, 10, endpoint=False)
Out[5]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [6]: range(0, 10)
Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


I'm not arguing this was necessarily a good idea, just my theory on how it 
came to be.  Perhaps R. Kern or one of the numpy lurkers in here will 
pitch in with a better recollection.

> * If you exclude the end point, the stepsize, and hence the values
> returned, change:
> 
>  >>> linspace(1, 2, 4)
> array([ 1.        ,  1.33333333,  1.66666667,  2.        ])
>  >>> linspace(1, 2, 4, endpoint=False)
> array([ 1.  ,  1.25,  1.5 ,  1.75])
> 
> This surprises me. I expect that excluding the end point will just
> exclude the end point, i.e. return one fewer point. That is, I expect
> num to count the number of subdivisions, not the number of points.

I find it very natural.  It's important to remember that *the whole point* 
of linspace's existence is to provide arrays with a known, fixed number of 
points:

In [17]: npts = 10

In [18]: len(linspace(0, 5, npts))
Out[18]: 10

In [19]: len(linspace(0, 5, npts, endpoint=False))
Out[19]: 10

So the invariant to preserve is *precisely* the number of points, not the 
step size.  As Guido has pointed out several times, the value of this 
function is precisely to steer people *away* from thinking of step sizes 
in a context where they are more likely than not going to get it wrong.  
So linspace focuses on a guaranteed number of points, and lets the step 
size chips fall where they may.


> * The retstep argument changes the return signature from => array to =>
> (array, number). I think that's a pretty ugly thing to do. If linspace
> returned a special iterator object, the step size could be exposed as an
> attribute.

Yup, it's not pretty but understandable in numpy's context, a library that 
has a very strong design focus around arrays, and numpy arrays don't have 
writable attributes:

In [20]: a = linspace(0, 10)

In [21]: a.stepsize = 0.1
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/fperez/<ipython-input-21-ded7f1198857> in <module>()
----> 1 a.stepsize = 0.1

AttributeError: 'numpy.ndarray' object has no attribute 'stepsize'


So while not the most elegant solution (and I agree that with a different 
return object a different approach can be taken), I think it's a practical 
compromise that works well for numpy.

> * I'm not sure that start/end/count is a better API than
> start/step/count.

Guido has argued this point quite well, I think, but let me add that many 
years of experience and millions of lines of numerical code beg to 
differ.  start/end/count is *precisely* the right api for this problem, 
and exposing step directly is very much the wrong thing to do here.

I should add that numpy does provide an 'arange' function that does match 
the built-in range() api, but returns an array instead of a list/
iterator.  This function does happen to allow for floating-point steps, 
but does come  with the following warning about them in its docstring:

Docstring:
arange([start,] stop[, step,], dtype=None, maskna=False)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval ``[start, stop)``
(in other words, the interval including `start` but excluding `stop`).
For integer arguments the function is equivalent to the Python built-in
`range <http://docs.python.org/lib/built-in-funcs.html>`_ function,
but returns a ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not
be consistent.  It is better to use ``linspace`` for these cases.

# END docstring

> * This one is pure bike-shedding: I don't like the name linspace.

Sure, in numpy's case it was chosen purely to make existing matlab users 
more comfortable, I think.  I don't particularly like it either (I don't 
come from a matlab background myself), FWIW.


I do hope, though, that the chosen name is *not*:

- 'interval'.  An interval in mathematics has a strong notion of only 
endpoints, containing all elements between its endpoints in the underlying 
ordered set.

- 'interpolate' or similar: numerical interpolation is a whole 'nother 
topic and I think this name would be more likely to confuse people 
expecting function interpolation than anything.

But thanks for looking into this, and I do hope that feedback from the 
numpy/scipy users and accumulated experience is useful.

Cheers,

f


From nad at acm.org  Thu Sep 29 00:29:00 2011
From: nad at acm.org (Ned Deily)
Date: Wed, 28 Sep 2011 15:29:00 -0700
Subject: [Python-Dev] Heads up: Apple llvm gcc 4.2 miscompiles PEP 393
References: <20110928132422.Horde.OvQBCtjz9kROgwPm5ZwktiA@webmail.df.eu>
	<FB44652B-5C10-4A32-9AD9-37826D81E9EF@masklinn.net>
	<4E835E1C.8090700@v.loewis.de>
	<74F6ADFA-874D-4BAC-B304-CE8B12D80126@masklinn.net>
Message-ID: <nad-6D6755.15290028092011@news.gmane.org>

In article <74F6ADFA-874D-4BAC-B304-CE8B12D80126 at masklinn.net>,
 Xavier Morel <catch-all at masklinn.net> wrote:

> On 2011-09-28, at 19:49 , Martin v. L?wis wrote:
> > 
> > Thanks for the advise - I didn't expect that Apple ships thhree compilers?
> Yeah I can understand that, they're in the middle of the transition but Clang 
> is not quite there yet so...

BTW, at the moment, we are still using gcc-4.2 (not gcc-llvm nor clang) 
from Xcode 3 on OS X 10.6 for the 64-bit/32-bit installer builds and 
gcc-4.0 on 10.5 for the 32-bit-only installer builds.  We will probably 
revisit that as we get closer to 3.3 alphas and betas.

-- 
 Ned Deily,
 nad at acm.org


From greg.ewing at canterbury.ac.nz  Thu Sep 29 00:36:21 2011
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 29 Sep 2011 11:36:21 +1300
Subject: [Python-Dev] range objects in 3.x
In-Reply-To: <j601ju$lnb$1@dough.gmane.org>
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<j601ju$lnb$1@dough.gmane.org>
Message-ID: <4E83A165.6080406@canterbury.ac.nz>

Fernando Perez wrote:

> Now, I *suspect* (but don't remember for sure) that the option to have it 
> right-hand-open-ended was to match the mental model people have for range:
> 
> In [5]: linspace(0, 10, 10, endpoint=False)
> Out[5]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
> 
> In [6]: range(0, 10)
> Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

My guess would be it's so that you can concatenate two sequences
created with linspace covering adjacent ranges and get the same
result as a single linspace call covering the whole range.

> I do hope, though, that the chosen name is *not*:
> 
> - 'interval'
> 
> - 'interpolate' or similar

Would 'subdivide' be acceptable?

-- 
Greg

From eric at trueblade.com  Thu Sep 29 01:21:48 2011
From: eric at trueblade.com (Eric V. Smith)
Date: Wed, 28 Sep 2011 19:21:48 -0400
Subject: [Python-Dev] [Python-checkins] cpython: Implement PEP 393.
In-Reply-To: <E1R8nkN-0007Z5-Nx@dinsdale.python.org>
References: <E1R8nkN-0007Z5-Nx@dinsdale.python.org>
Message-ID: <4E83AC0C.2010006@trueblade.com>

Is there some reason str.format had such major surgery done to it? It
appears parts of it were removed from stringlib. I had not even thought
to look at the code before it was merged, as it never occurred to me
anyone would do that.

I left it in stringlib even in 3.x because there's the occasional talk
of adding bytes.bformat, and since all of the code works well with
stringlib (since it was used by str and unicode in 2.x), it made sense
to leave it there.

In addition, there are outstanding patches that are now broken.

I'd prefer it return to how it used to be, and just the minimum changes
required for PEP 393 be made to it.

Thanks.
Eric.

On 9/28/2011 2:35 AM, martin.v.loewis wrote:
> http://hg.python.org/cpython/rev/8beaa9a37387
> changeset:   72475:8beaa9a37387
> user:        Martin v. L?wis <martin at v.loewis.de>
> date:        Wed Sep 28 07:41:54 2011 +0200
> summary:
>   Implement PEP 393.
> 
> files:
>   Doc/c-api/unicode.rst                  |     9 +
>   Include/Python.h                       |     5 +
>   Include/complexobject.h                |     5 +-
>   Include/floatobject.h                  |     5 +-
>   Include/longobject.h                   |     6 +-
>   Include/pyerrors.h                     |     6 +
>   Include/pyport.h                       |     3 +
>   Include/unicodeobject.h                |   783 +-
>   Lib/json/decoder.py                    |     3 +-
>   Lib/test/json_tests/test_scanstring.py |    11 +-
>   Lib/test/test_codeccallbacks.py        |     7 +-
>   Lib/test/test_codecs.py                |     4 +
>   Lib/test/test_peepholer.py             |     4 -
>   Lib/test/test_re.py                    |     7 +
>   Lib/test/test_sys.py                   |    38 +-
>   Lib/test/test_unicode.py               |    41 +-
>   Makefile.pre.in                        |     6 +-
>   Misc/NEWS                              |     2 +
>   Modules/_codecsmodule.c                |     8 +-
>   Modules/_csv.c                         |     2 +-
>   Modules/_ctypes/_ctypes.c              |     6 +-
>   Modules/_ctypes/callproc.c             |     8 -
>   Modules/_ctypes/cfield.c               |    64 +-
>   Modules/_cursesmodule.c                |     7 +-
>   Modules/_datetimemodule.c              |    13 +-
>   Modules/_dbmmodule.c                   |    12 +-
>   Modules/_elementtree.c                 |    31 +-
>   Modules/_io/_iomodule.h                |     2 +-
>   Modules/_io/stringio.c                 |    69 +-
>   Modules/_io/textio.c                   |   352 +-
>   Modules/_json.c                        |   252 +-
>   Modules/_pickle.c                      |     4 +-
>   Modules/_sqlite/connection.c           |    19 +-
>   Modules/_sre.c                         |   382 +-
>   Modules/_testcapimodule.c              |     2 +-
>   Modules/_tkinter.c                     |    70 +-
>   Modules/arraymodule.c                  |     8 +-
>   Modules/md5module.c                    |    10 +-
>   Modules/operator.c                     |    27 +-
>   Modules/pyexpat.c                      |    11 +-
>   Modules/sha1module.c                   |    10 +-
>   Modules/sha256module.c                 |    10 +-
>   Modules/sha512module.c                 |    10 +-
>   Modules/sre.h                          |     4 +-
>   Modules/syslogmodule.c                 |    14 +-
>   Modules/unicodedata.c                  |    28 +-
>   Modules/zipimport.c                    |   141 +-
>   Objects/abstract.c                     |     4 +-
>   Objects/bytearrayobject.c              |   147 +-
>   Objects/bytesobject.c                  |   127 +-
>   Objects/codeobject.c                   |    15 +-
>   Objects/complexobject.c                |    19 +-
>   Objects/dictobject.c                   |    20 +-
>   Objects/exceptions.c                   |    26 +-
>   Objects/fileobject.c                   |    17 +-
>   Objects/floatobject.c                  |    19 +-
>   Objects/longobject.c                   |    84 +-
>   Objects/moduleobject.c                 |     9 +-
>   Objects/object.c                       |    10 +-
>   Objects/setobject.c                    |    40 +-
>   Objects/stringlib/count.h              |     9 +-
>   Objects/stringlib/eq.h                 |    23 +-
>   Objects/stringlib/fastsearch.h         |     4 +-
>   Objects/stringlib/find.h               |    31 +-
>   Objects/stringlib/formatter.h          |  1516 --
>   Objects/stringlib/localeutil.h         |    27 +-
>   Objects/stringlib/partition.h          |    12 +-
>   Objects/stringlib/split.h              |    26 +-
>   Objects/stringlib/string_format.h      |  1385 --
>   Objects/stringlib/stringdefs.h         |     2 +
>   Objects/stringlib/ucs1lib.h            |    35 +
>   Objects/stringlib/ucs2lib.h            |    34 +
>   Objects/stringlib/ucs4lib.h            |    34 +
>   Objects/stringlib/undef.h              |    10 +
>   Objects/stringlib/unicode_format.h     |  1416 ++
>   Objects/stringlib/unicodedefs.h        |     2 +
>   Objects/typeobject.c                   |    18 +-
>   Objects/unicodeobject.c                |  6112 ++++++++---
>   Objects/uniops.h                       |    91 +
>   PC/_subprocess.c                       |    61 +-
>   PC/import_nt.c                         |     2 +-
>   PC/msvcrtmodule.c                      |     8 +-
>   PC/pyconfig.h                          |     4 -
>   PC/winreg.c                            |     8 +-
>   Parser/tokenizer.c                     |     6 +-
>   Python/_warnings.c                     |    16 +-
>   Python/ast.c                           |    61 +-
>   Python/bltinmodule.c                   |    26 +-
>   Python/ceval.c                         |    17 +-
>   Python/codecs.c                        |    44 +-
>   Python/compile.c                       |    89 +-
>   Python/errors.c                        |     4 +-
>   Python/formatter_unicode.c             |  1445 ++-
>   Python/getargs.c                       |    46 +-
>   Python/import.c                        |   347 +-
>   Python/marshal.c                       |     4 +-
>   Python/peephole.c                      |    18 -
>   Python/symtable.c                      |     8 +-
>   Python/traceback.c                     |    59 +-
>   Tools/gdb/libpython.py                 |    27 +-
>   configure                              |    65 +-
>   configure.in                           |    46 +-
>   pyconfig.h.in                          |     6 -


From benjamin at python.org  Thu Sep 29 02:07:02 2011
From: benjamin at python.org (Benjamin Peterson)
Date: Wed, 28 Sep 2011 20:07:02 -0400
Subject: [Python-Dev] [Python-checkins] cpython: Enhance
 Py_ARRAY_LENGTH(): fail at build time if the argument is not an array
In-Reply-To: <E1R93IS-0005jZ-LZ@dinsdale.python.org>
References: <E1R93IS-0005jZ-LZ@dinsdale.python.org>
Message-ID: <CAPZV6o8fDKZoipB-3-r+k6pQEQsURWQJfLmBxJOMivrh-M=RkA@mail.gmail.com>

2011/9/28 victor.stinner <python-checkins at python.org>:
> http://hg.python.org/cpython/rev/36fc514de7f0
> changeset: ? 72512:36fc514de7f0
> user: ? ? ? ?Victor Stinner <victor.stinner at haypocalc.com>
> date: ? ? ? ?Thu Sep 29 01:12:24 2011 +0200
> summary:
> ?Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array
>
> Move other various macros to pymcacro.h
>
> Thanks Rusty Russell for having written these amazing C macros!
>
> files:
> ?Include/Python.h ? ? ? ? ?| ?19 +--------
> ?Include/pymacro.h ? ? ? ? | ?57 +++++++++++++++++++++++++++

Do we really need a new file? Why not pyport.h where other compiler stuff goes?


-- 
Regards,
Benjamin

From victor.stinner at haypocalc.com  Thu Sep 29 02:27:48 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 29 Sep 2011 02:27:48 +0200
Subject: [Python-Dev] PEP 393 close to pronouncement
In-Reply-To: <4E834EE7.4050706@egenix.com>
References: <CAP7+vJLDK10VfQ+KRXDAMrNaRV+-1W76SdBGQNd9GHAbiuPOwA@mail.gmail.com>
	<CAP7+vJLsogEO5ERjUmPWbj1GewKFrcEh-v5DDLqgX+oJYvJFLA@mail.gmail.com>
	<4E834EE7.4050706@egenix.com>
Message-ID: <201109290227.48340.victor.stinner@haypocalc.com>

> Resizing
> --------
> 
> Codecs use resizing a lot. Given that PyCompactUnicodeObject
> does not support resizing, most decoders will have to use
> PyUnicodeObject and thus not benefit from the memory footprint
> advantages of e.g. PyASCIIObject.

Wrong. Even if you create a string using the legacy API (e.g. 
PyUnicode_FromUnicode), the string will be quickly compacted to use the most 
efficient memory storage (depending on the maximum character). "quickly": at the 
first call to PyUnicode_READY. Python tries to make all strings ready as early 
as possible.

> PyASCIIObject has a wchar_t *wstr pointer - I guess this should
> be a char *str pointer, otherwise, where's the memory footprint
> advantage (esp. on Linux where sizeof(wchar_t) == 4) ?

For pure ASCII strings, you don't have to store a pointer to the UTF-8 string, 
nor the length of the UTF-8 string (in bytes), nor the length of the wchar_t 
string (in wide characters): the length is always the length of the "ASCII" 
string, and the UTF-8 string is shared with the ASCII string. The structure is 
much smaller thanks to these optimizations, and so Python 3.3 uses less memory 
than 2.7 for ASCII strings, even for short strings.

> I also don't see a reason to limit the UCS1 storage version
> to ASCII. Accordingly, the object should be called PyLatin1Object
> or PyUCS1Object.

Latin1 is less interesting, you cannot share length/data fields with utf8 or 
wstr. We didn't add a special case for Latin1 strings (except using Py_UCS1* 
strings to store their characters).

> Furthermore, determining len(obj) will require a loop over
> the data, checking for surrogate code points. A simple memcpy()
> is no longer enough.

Wrong. len(obj) gives the "right" result (see the long discussion about what 
is the length of a string in a previous thread...) in O(1) since it's computed 
when the string is created.

> ... in practice you only
> very rarely see any non-BMP code points in your data. Making
> all Python users pay for the needs of a tiny fraction is
> not really fair. Remember: practicality beats purity.

The creation of the string is maybe is little bit slower (especially when you 
have to scan the string twice to first get the maximum character), but I think 
that this slow down is smaller than the speedup allowed by the PEP.

Because ASCII strings are now char*, I think that processing ASCII strings is 
faster because the CPU can cache more data (close to the CPU).

We can do better optimization on ASCII and Latin1 strings (it's faster to 
manipulate char* than uint16_t* or uint32_t*). For example, str.center(), 
str.ljust, str.rjust and str.zfill do now use the very fast memset() function 
for latin1 strings to pad the string.

Another example, duplicating a string (or create a substring) should be faster 
just because you have less data to copy (e.g. 10 bytes for a string of 10 
Latin1 characters vs 20 or 40 bytes with Python 3.2).

The two most common encodings in the world are ASCII and UTF-8. With the PEP 
393, encoding to ASCII or UTF-8 is free, you don't have to encode anything, 
you have directly the encoded char* buffer (whereas you have to convert 16/32 
bit wchar_t to char* in Python 3.2, even for pure ASCII). (It's also free to 
encode "Latin1" Unicode string to Latin1.)

With the PEP 393, we never have to decode UTF-16 anymore when iterating on 
code pointer to support correctly non-BMP characters (which was required 
before in narrow build, e.g. on Windows). Iterate on code point is just a 
dummy loop, no need to check if each character is in range U+D800-U+DFFF.

There are other funny tricks (optimizations). For example, text.replace(a, b) 
knows that there is nothing to do if maxchar(a) > maxchar(text), where 
maxchar(obj) just requires to read an attribute of the string. Think about 
ASCII and non-ASCII strings: pure_ascii.replace('\xe9', '') now just creates a 
new reference...

I don't think that Martin wrote his PEP to be able to implement all these 
optimisations, but there are an interesting side effect of his PEP :-)

> The table only lists string sizes up 8 code points. The memory
> savings for these are really only significant for ASCII
> strings on 64-bit platforms, if you use the default UCS2
> Python build as basis.

In the 32 different cases, the PEP 393 is better in 29 cases and "just" as good 
as Python 3.2 in 3 corner cases:

- 1 ASCII, 16-bit wchar, 32-bit
- 1 Latin1, 32-bit wchar, 32-bit
- 2 Latin1, 32-bit wchar, 32-bit

Do you really care of these corner cases? See the more the realistic benchmark 
in previous Martin's email ("PEP 393 memory savings update"): the PEP 393 not 
only uses 3x less memory than 3.2, but it uses also *less* memory than Python 
2.7, whereas Python 3 uses Unicode for everything!

> For larger strings, I expect the savings to be more significant.

Sure.

> OTOH, a single non-BMP code point in such a string would cause
> the savings to drop significantly again.

In this case, it's just as good as Python 3.2 in wide mode, but worse than 3.2 
in narrow mode. But is it a real use case?

If you want a really efficient storage for heterogeneous strings (mixing ASCII, 
Latin1, BMP and non-BMP), you can split the text into chunks. For example, I 
hope that a text processor like LibreOffice doesn't store all paragraphs in the 
same string, but create at least a string per paragraph. If you use short 
chunks, you will not notice the difference in memory footprint when you insert 
a non-BMP character. The trick doesn't work on Python < 3.3.

> For best performance, each algorithm will have to be implemented
> for all three storage types. ...

Good performances can be archived using PyUnicode macros like PyUnicode_READ 
and PyUnicode_WRITE. But yes, if you want a super-fast Unicode processor, you 
can special case some kinds (UCS1, UCS2, UCS4), like the examples I described 
before (use memset for latin1).

> ... Not doing so, will result in a slow-down, if I read the PEP
> correctly.

I don't think so. Browse the new unicodeobject.c, there are few switch/case on 
the kind (if you ignore the low-level functions like _PyUnicode_Ready). For 
example, unicode_isalpha() has only one implementation, using PyUnicode_READ. 
PyUnicode_READ doesn't use a switch but classic (fast) arithmetic on pointers.

> It's difficult to say, of what scale, since that
> information is not given in the PEP, but the added loop over
> the complete data array in order to determine the maximum
> code point value suggests that it is significant.

Feel free to run yourself Antoine's benchmarks like stringbench and iobench, 
they do micro-benchmarks. But you have to know that very few codecs use the 
new Unicode API (I think that only UTF-8 encoder and decoder use the new API, 
maybe also the ASCII codec).

> I am not convinced that the memory savings are big enough
> to warrant the performance penalty and added complexity
> suggested by the PEP.

I didn't run any benchmark, but I don't think that the PEP 393 makes Python 
slower. I expect a minor speedup in some corner cases :-) I prefer to wait 
until all modules are converted to the new API to run benchmarks. TODO: 
unicodedata, _csv, all codecs (especially error handlers), ...

> In practice, using a UCS2 build of Python usually is a good
> compromise between memory savings, performance and standards
> compatibility

About "standards compatibility", the work to support non-BMP characters 
everywhere was not finished in Python 3.2, 11 years after the introduction of 
Unicode in Python (2.0). Using the new API, non-BMP characters will be 
supported for free, everywhere (especially in *Python*, "\U0010FFFF"[0] and 
len("\U0010FFFF") doesn't give surprising results anymore).

With the addition of emoticon in a non-BMP range in Unicode 6, non-BMP 
characters will become more and more common. Who doesn't like emoticon? :-) 
o;-) >< (no, I will no add non-BMP characters in this email, I don't want to 
crash your SMTP server and mail client)

> IMHO, Python should be optimized for UCS2 usage

With the PEP 393, it's better: Python is optimize for any usage! (but I expect 
it to be faster in the Latin1 range, U+0000-U+00FF)

> I do see the advantage for large strings, though.

A friend reads last Martin's benchmark differently: Python 3.2 uses 3x more 
memory than Python 2! Can I say that the PEP 393 fixed an huge regression of 
Python 3?

> Given that I've been working on and maintaining the Python Unicode
> implementation actively or by providing assistance for almost
> 12 years now, I've also thought about whether it's still worth
> the effort.

Thanks for your huge work on Unicode, Marc-Andre!

> My interests have shifted somewhat into other directions and
> I feel that helping Python reach world domination in other ways
> makes me happier than fighting over Unicode standards, implementations,
> special cases that aren't special enough, and all those other
> nitty-gritty details that cause long discussions :-)

Someone said that we still need to define what a character is! By the way, what 
is a code point?

> So I feel that the PEP 393 change is a good time to draw a line
> and leave Unicode maintenance to Ezio, Victor, Martin, and
> all the others that have helped over the years. I know it's
> in good hands.

I don't understand why you would like to stop contribution to Unicode, but 
well, as you want. We will try to continue your work.

Victor

From victor.stinner at haypocalc.com  Thu Sep 29 03:45:59 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 29 Sep 2011 03:45:59 +0200
Subject: [Python-Dev] [Python-checkins] cpython: Enhance
	Py_ARRAY_LENGTH(): fail at build time if the argument is not
	an array
In-Reply-To: <CAPZV6o8fDKZoipB-3-r+k6pQEQsURWQJfLmBxJOMivrh-M=RkA@mail.gmail.com>
References: <E1R93IS-0005jZ-LZ@dinsdale.python.org>
	<CAPZV6o8fDKZoipB-3-r+k6pQEQsURWQJfLmBxJOMivrh-M=RkA@mail.gmail.com>
Message-ID: <201109290345.59665.victor.stinner@haypocalc.com>

Le jeudi 29 septembre 2011 02:07:02, Benjamin Peterson a ?crit :
> 2011/9/28 victor.stinner <python-checkins at python.org>:
> > http://hg.python.org/cpython/rev/36fc514de7f0
> > changeset:   72512:36fc514de7f0
> > user:        Victor Stinner <victor.stinner at haypocalc.com>
> > date:        Thu Sep 29 01:12:24 2011 +0200
> > summary:
> >  Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an
> > array
> > 
> > Move other various macros to pymcacro.h
> > 
> > Thanks Rusty Russell for having written these amazing C macros!
> > 
> > files:
> >  Include/Python.h          |  19 +--------
> >  Include/pymacro.h         |  57 +++++++++++++++++++++++++++
> 
> Do we really need a new file? Why not pyport.h where other compiler stuff
> goes?

I'm not sure that pyport.h is the right place to add Py_MIN, Py_MAX, 
Py_ARRAY_LENGTH. pyport.h looks to be related to all things specific to the 
platform like INT_MAX, Py_VA_COPY, ... pymacro.h contains platform independant 
macros.

I would like to suggest the opposite: move platform independdant macros from 
pyport.h to pymacro.h :-) Suggestions:
 - Py_ARITHMETIC_RIGHT_SHIFT
 - Py_FORCE_EXPANSION
 - Py_SAFE_DOWNCAST

Victor

From fperez.net at gmail.com  Thu Sep 29 06:42:10 2011
From: fperez.net at gmail.com (Fernando Perez)
Date: Thu, 29 Sep 2011 04:42:10 +0000 (UTC)
Subject: [Python-Dev] range objects in 3.x
References: <4E7CCA42.2060100@stoneleaf.us> <4E7D3407.5000207@pearwood.info>
	<CAP7+vJ+pAAg7e57Ytzg=egsWpMmYRn=gW93faRFvDF6UriboLA@mail.gmail.com>
	<4E7D3CB8.5050904@pearwood.info>
	<CAP7+vJKtaR1ba_1M5P7+T6vv1+1KAU=RdrXN6sg-T_eXE0s98Q@mail.gmail.com>
	<j5juq0$fuq$1@dough.gmane.org>
	<CAP7+vJ+bZzn-m4ziQ-71LVhHB=RSK82UEpDeM_H_NadTr02iTg@mail.gmail.com>
	<j5qph4$p0l$1@dough.gmane.org> <4E81261C.6040200@pearwood.info>
	<j601ju$lnb$1@dough.gmane.org> <4E83A165.6080406@canterbury.ac.nz>
Message-ID: <j60sv2$qe7$1@dough.gmane.org>

On Thu, 29 Sep 2011 11:36:21 +1300, Greg Ewing wrote:


>> I do hope, though, that the chosen name is *not*:
>> 
>> - 'interval'
>> 
>> - 'interpolate' or similar
> 
> Would 'subdivide' be acceptable?

I'm not great at finding names, and I don't totally love it, but I 
certainly don't see any problems with it.  It is, after all, a subdivision 
of an interval :)

I think 'grid' has been mentioned, and I think it's reasonable, even 
though most people probably associate the word with a two-dimensional 
object.  But grids can have any desired dimensionality.

Now, in fact, numpy has a slightly demented (but extremely useful) ogrid 
object:

In [7]: ogrid[0:10:3]
Out[7]: array([0, 3, 6, 9])

In [8]: ogrid[0:10:3j]
Out[8]: array([  0.,   5.,  10.])

Yup, that's a complex slice :)

So if python named the builtin 'grid', I think it would go well with 
existing numpy habits.

Cheers,

f


From ezio.melotti at gmail.com  Thu Sep 29 09:54:37 2011
From: ezio.melotti at gmail.com (Ezio Melotti)
Date: Thu, 29 Sep 2011 10:54:37 +0300
Subject: [Python-Dev] Hg tips (was Re: [Python-checkins] cpython (merge
 default -> default): Merge heads.)
Message-ID: <CACBhJdFH8jaiEakk8FX94+W3qwiSLkyn4_U3U9_gYCde9wmyHg@mail.gmail.com>

Tip 1 -- merging heads:

A while ago ?ric suggested a nice tip to make merges easier and since I
haven't seen many people using it and now I got a chance to use it again, I
think it might be worth showing it once more:

# so assume you just committed some changes:
$ hg ci Doc/whatsnew/3.3.rst -m 'Update and reorganize the whatsnew entry
for PEP 393.'
# you push them, but someone else pushed something in the meanwhile, so the
push fails
$ hg push
pushing to ssh://hg at hg.python.org/cpython
searching for changes
abort: push creates new remote heads on branch 'default'!
(you should pull and merge or use push -f to force)
# so you pull the other changes
$ hg pull -u
pulling from ssh://hg at hg.python.org/cpython
searching for changes
adding changesets
adding manifests
adding file changes
added 4 changesets with 5 changes to 5 files (+1 heads)
not updating, since new heads added
(run 'hg heads' to see heads, 'hg merge' to merge)
# and use "hg heads ." to see the two heads (yours and the one you pulled)
in the current branch
$ hg heads .
changeset:   72521:e6a2b54c1d16
tag:         tip
user:        Victor Stinner <victor.stinner at haypocalc.com>
date:        Thu Sep 29 04:02:13 2011 +0200
summary:     Fix hex_digit_to_int() prototype: expect Py_UCS4, not
Py_UNICODE

changeset:   72517:ba6ee5cc9ed6
user:        Ezio Melotti <ezio.melotti at gmail.com>
date:        Thu Sep 29 08:34:36 2011 +0300
summary:     Update and reorganize the whatsnew entry for PEP 393.
# here comes the tip: before merging you switch to the other head (i.e. the
one pushed by Victor),
# if you don't switch, you'll be merging Victor changeset and in case of
conflicts you will have to review
# and modify his code (e.g. put a Misc/NEWS entry in the right section or
something more complicated)
$ hg up e6a2b54c1d16
6 files updated, 0 files merged, 0 files removed, 0 files unresolved
# after the switch you will merge the changeset you just committed, so in
case of conflicts
# reviewing and merging is much easier because you know the changes already
$ hg merge
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
# here everything went fine and there were no conflicts, and in the diff I
can see my last changeset
$ hg di
diff --git a/Doc/whatsnew/3.3.rst b/Doc/whatsnew/3.3.rst
[...]
# everything looks fine, so I can commit the merge and push
$ hg ci -m 'Merge heads.'
$ hg push
pushing to ssh://hg at hg.python.org/cpython
searching for changes
remote: adding
changesets

remote: adding manifests
remote: adding file changes
remote: added 2 changesets with 1 changes to 1 files
remote: buildbot: 2 changes sent successfully
remote: notified python-checkins at python.org of incoming changeset
ba6ee5cc9ed6
remote: notified python-checkins at python.org of incoming changeset
e7672fe3cd35

This tip is not only useful while merging, but it's also useful for
python-checkins reviews, because the "merge" mail has the same diff  of the
previous mail rather than having 15 unrelated changesets from the last week
because the committer didn't pull in a while.


Tip 2 -- extended diffs:

If you haven't already, enable git diffs, adding to your ~/.hgrc the
following two lines:

> [diff]
> git = True
>
(this is already in the devguide, even if 'git = on' is used there. The
mercurial website uses git = True too.)
More info:  http://hgtip.com/tips/beginner/2009-10-22-always-use-git-diffs/


Tip 3 -- extensions:

I personally like the 'color' extension, it makes the output of commands
like 'hg diff' and 'hg stat' more readable (e.g. it shows removed lines in
red and added ones in green).
If you want to give it a try, add to your ~/.hgrc the following two lines:

> [extensions]
> color =
>

If you find operations like pulling, updating or cloning too slow, you might
also want to look at the 'progress' extension, which displays a progress bar
during these operations:

> [extensions]
> progress =
>


Tip 4 -- porting from 2.7 to 3.2:

The devguide suggests:
>
> hg export a7df1a869e4a | hg import --no-commit -
>
but it's not always necessary to copy the changeset number manually.
If you are porting your last commit you can just use 'hg export 2.7' (or any
other branch name):
* using the one-dir-per-branch setup:
  wolf at hp:~/dev/py/2.7$ hg ci -m 'Fix some bug.'
  wolf at hp:~/dev/py/2.7$ cd ../3.2
  wolf at hp:~/dev/py/3.2$ hg pull -u ../2.7
  wolf at hp:~/dev/py/3.2$ hg export 2.7 | hg import --no-commit -
* using the single-dir setup:
  wolf at hp:~/dev/python$ hg branch
  2.7
  wolf at hp:~/dev/python$ hg ci -m 'Fix some bug.'
  wolf at hp:~/dev/python$ hg up 3.2  # here you might enjoy the progress
extension
  wolf at hp:~/dev/python$ hg export 2.7 | hg import --no-commit -
And then you can check that everything is fine, and commit on 3.2 too.
Of course it works the other way around (from 3.2 to 2.7) too.


I hope you'll find these tips useful.

Best Regards,
Ezio Melotti


On Thu, Sep 29, 2011 at 8:36 AM, ezio.melotti <python-checkins at python.org>wrote:

> http://hg.python.org/cpython/rev/e7672fe3cd35
> changeset:   72522:e7672fe3cd35
> parent:      72520:e6a2b54c1d16
> parent:      72521:ba6ee5cc9ed6
> user:        Ezio Melotti <ezio.melotti at gmail.com>
> date:        Thu Sep 29 08:36:23 2011 +0300
> summary:
>  Merge heads.
>
> files:
>  Doc/whatsnew/3.3.rst |  63 +++++++++++++++++++++----------
>  1 files changed, 42 insertions(+), 21 deletions(-)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110929/e0e02c8b/attachment.html>

From victor.stinner at haypocalc.com  Thu Sep 29 12:07:14 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 29 Sep 2011 12:07:14 +0200
Subject: [Python-Dev] Hg tips
In-Reply-To: <CACBhJdFH8jaiEakk8FX94+W3qwiSLkyn4_U3U9_gYCde9wmyHg@mail.gmail.com>
References: <CACBhJdFH8jaiEakk8FX94+W3qwiSLkyn4_U3U9_gYCde9wmyHg@mail.gmail.com>
Message-ID: <4E844352.8040606@haypocalc.com>

Le 29/09/2011 09:54, Ezio Melotti a ?crit :
> Tip 1 -- merging heads:
>
> A while ago ?ric suggested a nice tip to make merges easier and since I
> haven't seen many people using it and now I got a chance to use it again, I
> think it might be worth showing it once more:
>
> # so assume you just committed some changes:
> $ hg ci Doc/whatsnew/3.3.rst -m 'Update and reorganize the whatsnew entry
> for PEP 393.'
> # you push them, but someone else pushed something in the meanwhile, so the
> push fails
> $ hg push
> pushing to ssh://hg at hg.python.org/cpython
> searching for changes
> abort: push creates new remote heads on branch 'default'!
> (you should pull and merge or use push -f to force)
> # so you pull the other changes
> $ hg pull -u
> pulling from ssh://hg at hg.python.org/cpython
> searching for changes
> adding changesets
> adding manifests
> adding file changes
> added 4 changesets with 5 changes to 5 files (+1 heads)
> not updating, since new heads added
> (run 'hg heads' to see heads, 'hg merge' to merge)
> # and use "hg heads ." to see the two heads (yours and the one you pulled)
> in the current branch
> $ hg heads .
> changeset:   72521:e6a2b54c1d16
> tag:         tip
> user:        Victor Stinner<victor.stinner at haypocalc.com>
> date:        Thu Sep 29 04:02:13 2011 +0200
> summary:     Fix hex_digit_to_int() prototype: expect Py_UCS4, not
> Py_UNICODE
>
> changeset:   72517:ba6ee5cc9ed6
> user:        Ezio Melotti<ezio.melotti at gmail.com>
> date:        Thu Sep 29 08:34:36 2011 +0300
> summary:     Update and reorganize the whatsnew entry for PEP 393.
> # here comes the tip: before merging you switch to the other head (i.e. the
> one pushed by Victor),
> # if you don't switch, you'll be merging Victor changeset and in case of
> conflicts you will have to review
> # and modify his code (e.g. put a Misc/NEWS entry in the right section or
> something more complicated)
> $ hg up e6a2b54c1d16
> 6 files updated, 0 files merged, 0 files removed, 0 files unresolved
> # after the switch you will merge the changeset you just committed, so in
> case of conflicts
> # reviewing and merging is much easier because you know the changes already
> $ hg merge
> 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
> (branch merge, don't forget to commit)
> # here everything went fine and there were no conflicts, and in the diff I
> can see my last changeset
> $ hg di
> diff --git a/Doc/whatsnew/3.3.rst b/Doc/whatsnew/3.3.rst
> [...]
> # everything looks fine, so I can commit the merge and push
> $ hg ci -m 'Merge heads.'
> $ hg push
> pushing to ssh://hg at hg.python.org/cpython
> searching for changes
> remote: adding
> changesets
>
> remote: adding manifests
> remote: adding file changes
> remote: added 2 changesets with 1 changes to 1 files
> remote: buildbot: 2 changes sent successfully
> remote: notified python-checkins at python.org of incoming changeset
> ba6ee5cc9ed6
> remote: notified python-checkins at python.org of incoming changeset
> e7672fe3cd35
>
> This tip is not only useful while merging, but it's also useful for
> python-checkins reviews, because the "merge" mail has the same diff  of the
> previous mail rather than having 15 unrelated changesets from the last week
> because the committer didn't pull in a while.

I prefer "hg pull --rebase && hg push": it's just one command (ok, two), 
there is nothing to do (it's fast)... if the new changes are not in 
conflict with my local changes, and it keeps a nice linear history.

hg rebase is more dangerous: you may lose work if you misuse it.

hg rebase is maybe more complex when you have a conflict (I don't really 
know, I never use hg merge).

hg rebase doesn't work at all if you have local changes in different 
branches. If hg push fails, I prefer to *remove* my changes using hg 
strip (!), update and redo the commits on the new tip. I should 
sometimes fix hg rebase instead :-)

> Tip 2 -- extended diffs:
>
> If you haven't already, enable git diffs, adding to your ~/.hgrc the
> following two lines:
>
>> [diff]
>> git = True
>>
> (this is already in the devguide, even if 'git = on' is used there. The
> mercurial website uses git = True too.)
> More info:  http://hgtip.com/tips/beginner/2009-10-22-always-use-git-diffs/

For diff, "showfunc = on" is also a cool feature. See my full ~/.hgrc:

https://bitbucket.org/haypo/misc/src/tip/conf/hgrc

  * I disabled the merge GUI: I lose a lot of work because I'm unable to 
use a GUI to do merge, I don't understand what are the 3 versions of the 
same file (which one is the merged version!?)
  * pager extension is just a must have
  * hgeditor is also a must have to write the changelog: in vim, it 
opens a second buffer with the diff

I also use "hg record" (like "git add -i") to do partial commit: after 
hacking during 3 hours, I do atomic commits. Then I use hg histedit 
(like "git rebase -i") to merge and reorganize local commits. It's 
useful to hide "oops, typo in my last commit".

> If you find operations like pulling, updating or cloning too slow, you might
> also want to look at the 'progress' extension, which displays a progress bar
> during these operations:
>
>> [extensions]
>> progress =

Yeah, I like it too :-)

Victor

From catch-all at masklinn.net  Thu Sep 29 12:34:34 2011
From: catch-all at masklinn.net (Xavier Morel)
Date: Thu, 29 Sep 2011 12:34:34 +0200
Subject: [Python-Dev] Hg tips
In-Reply-To: <4E844352.8040606@haypocalc.com>
References: <CACBhJdFH8jaiEakk8FX94+W3qwiSLkyn4_U3U9_gYCde9wmyHg@mail.gmail.com>
	<4E844352.8040606@haypocalc.com>
Message-ID: <165FF37D-8EE8-49CD-817C-600022942086@masklinn.net>

On 2011-09-29, at 12:07 , Victor Stinner wrote:
> 
> * I disabled the merge GUI: I lose a lot of work because I'm unable to use a GUI to do merge, I don't understand what are the 3 versions of the same file (which one is the merged version!?)
Generally none. By default, mercurial (and most similar tools) sets up LOCAL, BASE and OTHER. BASE is the last "common" state, LOCAL is the locally modified file and OTHER is the remotely modified file (which you're trying to merge).

The behavior after that depends, mercurial has an OUTPUT pointer (for the result file), many tools just write the non-postfixed file with the merge result. And depending on your precise tool it can attempt to perform its own merge resolution before showing you the files, or just show you the three files provided and you set up your changes into BASE from LOCAL and OTHER.

If you reach that state, it's becaused mercurial could not automatically process the merge so there's no merged version to display.

Maybe thinking of it as a file with conflict markers split into three (one without the conflicting sections, one with only the first part of the sections and one with only the second part) would make it clearer?

From victor.stinner at haypocalc.com  Thu Sep 29 12:50:19 2011
From: victor.stinner at haypocalc.com (Victor Stinner)
Date: Thu, 29 Sep 2011 12:50:19 +0200
Subject: [Python-Dev] Hg tips
In-Reply-To: <165FF37D-8EE8-49CD-817C-600022942086@masklinn.net>
References: <CACBhJdFH8jaiEakk8FX94+W3qwiSLkyn4_U3U9_gYCde9wmyHg@mail.gmail.com>
	<4E844352.8040606@haypocalc.com>
	<165FF37D-8EE8-49CD-817C-600022942086@masklinn.net>
Message-ID: <4E844D6B.7090304@haypocalc.com>

Le 29/09/2011 12:34, Xavier Morel a ?crit :
> Generally none. By default, mercurial (and most similar tools) sets up LOCAL, BASE and OTHER. BASE is the...

Sorry, but I'm unable to remember the meaning of LOCAL, BASE and OTHER. 
In meld, I have to scroll to the end of the filename so see the filename 
suffix. Anyway, my real problem was different: hg opened meld with the 3 
versions, but the BASE was already merged. I mean that hg chose for me 
what is the right version, without letting me choose myself what is the 
good version, because if I just close meld, I lose my local changes.

Because a merge is a new commit, I suppose that I can do something to 
get my local changes back. But, well, I just prefer the "legacy" (?) 
merge flavor:

<<<< local
...
===
...
 >>> other

It's easier for my brain because I just have 2 versions of the same 
code, not 3!

But it looks like some people are more confortable with 3 versions in a 
GUI, because it is the default Mercurial behaviour (to open a GUI to 
solve conflicts).

Victor

From stefan at bytereef.org  Thu Sep 29 12:58:10 2011
From: stefan at bytereef.org (Stefan Krah)
Date: Thu, 29 Sep 2011 12:58:10 +0200
Subject: [Python-Dev] Hg tips
In-Reply-To: <4E844D6B.7090304@haypocalc.com>
References: <CACBhJdFH8jaiEakk8FX94+W3qwiSLkyn4_U3U9_gYCde9wmyHg@mail.gmail.com>
	<4E844352.8040606@haypocalc.com>
	<165FF37D-8EE8-49CD-817C-600022942086@masklinn.net>
	<4E844D6B.7090304@haypocalc.com>
Message-ID: <20110929105810.GA20947@sleipnir.bytereef.org>

Victor Stinner <victor.stinner at haypocalc.com> wrote:
> Because a merge is a new commit, I suppose that I can do something to  
> get my local changes back. But, well, I just prefer the "legacy" (?)  
> merge flavor:
>
> <<<< local
> ...
> ===
> ...
> >>> other
>
> It's easier for my brain because I just have 2 versions of the same  
> code, not 3!

I also prefer /usr/bin/merge and I've never quite figured out the GUI.
Not that I spent a lot of time on it, since the "legacy" merge works
well (and is self-explanatory).


Stefan Krah


From catch-all at masklinn.net  Thu Sep 29 13:20:39 2011
From: catch-all at masklinn.net (Xavier Morel)
Date: Thu, 29 Sep 2011 13:20:39 +0200
Subject: [Python-Dev] Hg tips
In-Reply-To: <4E844D6B.7090304@haypocalc.com>
References: <CACBhJdFH8jaiEakk8FX94+W3qwiSLkyn4_U3U9_gYCde9wmyHg@mail.gmail.com>
	<4E844352.8040606@haypocalc.com>
	<165FF37D-8EE8-49CD-817C-600022942086@masklinn.net>
	<4E844D6B.7090304@haypocalc.com>
Message-ID: <305907B5-C766-4A03-9851-3ACF97107B52@masklinn.net>

On 2011-09-29, at 12:50 , Victor Stinner wrote:
> Le 29/09/2011 12:34, Xavier Morel a ?crit :
>> Generally none. By default, mercurial (and most similar tools) sets up LOCAL, BASE and OTHER. BASE is the...
> 
> Sorry, but I'm unable to remember the meaning of LOCAL, BASE and OTHER. In meld, I have to scroll to the end of the filename so see the filename suffix. Anyway, my real problem was different: hg opened meld with the 3 versions, but the BASE was already merged. I mean that hg chose for me what is the right version, without letting me choose myself what is the good version, because if I just close meld, I lose my local changes.
I'd bet it's Meld doing that, though I have not checked (Araxis Merge does something similar, it has its own merge-algorithm which it tries to apply in case of 3-ways merge, trying to merge LOCAL and OTHER into base on its own).

Look into Meld's configuration, it might be possible to disable that.

(an other possibility would be that the wrong file pointers are send to Meld, so it gets e.g. twice the same file)

> Because a merge is a new commit, I suppose that I can do something to get my local changes back. But, well, I just prefer the "legacy" (?) merge flavor:
> 
> <<<< local
> ...
> ===
> ...
> >>> other
> 
> It's easier for my brain because I just have 2 versions of the same code, not 3!
> 
> But it looks like some people are more confortable with 3 versions in a GUI, because it is the default Mercurial behaviour (to open a GUI to solve conflicts).
> 
I'd be part of that camp, yes (though I'll use either depending on the exact situation, there are cases where seeing what both branches diverged from is very useful). I find having all three version makes it easier to correctly mix the two diverging versions, with /usr/bin/merge-style conflict markers it's harder to understand what both branches diverged from and hence how their changes fit into one another.


From jimjjewett at gmail.com  Thu Sep 29 15:22:00 2011
From: jimjjewett at gmail.com (Jim Jewett)
Date: Thu, 29 Sep 2011 09:22:00 -0400
Subject: [Python-Dev] [Python-checkins] cpython: Enhance
 Py_ARRAY_LENGTH(): fail at build time if the argument is not an array
In-Reply-To: <CAPZV6o8fDKZoipB-3-r+k6pQEQsURWQJfLmBxJOMivrh-M=RkA@mail.gmail.com>
References: <E1R93IS-0005jZ-LZ@dinsdale.python.org>
	<CAPZV6o8fDKZoipB-3-r+k6pQEQsURWQJfLmBxJOMivrh-M=RkA@mail.gmail.com>
Message-ID: <CA+OGgf7SEn3jDh0yyt5ewvF-2b3MzHt2k3bDwzbOjyPJh_pCPw@mail.gmail.com>

On Wed, Sep 28, 2011 at 8:07 PM, Benjamin Peterson <benjamin at python.org> wrote:
> 2011/9/28 victor.stinner <python-checkins at python.org>:
>> http://hg.python.org/cpython/rev/36fc514de7f0
>> changeset: ? 72512:36fc514de7f0

...
>> Thanks Rusty Russell for having written these amazing C macros!

> Do we really need a new file? Why not pyport.h where other compiler stuff goes?

I would expect pyport to contain only system-specific macros.  These
seem more universal.

-jJ

From barry at python.org  Thu Sep 29 17:11:50 2011
From: barry at python.org (Barry Warsaw)
Date: Thu, 29 Sep 2011 11:11:50 -0400
Subject: [Python-Dev] Hg tips
In-Reply-To: <4E844352.8040606@haypocalc.com>
References: <CACBhJdFH8jaiEakk8FX94+W3qwiSLkyn4_U3U9_gYCde9wmyHg@mail.gmail.com>
	<4E844352.8040606@haypocalc.com>
Message-ID: <20110929111150.352b7be5@resist.wooz.org>

On Sep 29, 2011, at 12:07 PM, Victor Stinner wrote:

> I disabled the merge GUI: I lose a lot of work because I'm unable to use a
> GUI to do merge, I don't understand what are the 3 versions of the same file
> (which one is the merged version!?)

Emacs users should look at smerge-mode.  It has some nice keybindings and
colorizing that usually makes resolving conflicts fairly straightforward.  It
also will automatically `$vcs resolve` the file when you've handled all the
conflicts.

Caveat: I use it primarily for bzr, but I think it works with most vcs's.

-Barry

From g.brandl at gmx.net  Thu Sep 29 18:20:43 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 29 Sep 2011 18:20:43 +0200
Subject: [Python-Dev] Hg tips
In-Reply-To: <4E844D6B.7090304@haypocalc.com>
References: <CACBhJdFH8jaiEakk8FX94+W3qwiSLkyn4_U3U9_gYCde9wmyHg@mail.gmail.com>
	<4E844352.8040606@haypocalc.com>
	<165FF37D-8EE8-49CD-817C-600022942086@masklinn.net>
	<4E844D6B.7090304@haypocalc.com>
Message-ID: <j625q2$ksk$1@dough.gmane.org>

Am 29.09.2011 12:50, schrieb Victor Stinner:
> Le 29/09/2011 12:34, Xavier Morel a ?crit :
>> Generally none. By default, mercurial (and most similar tools) sets up LOCAL, BASE and OTHER. BASE is the...
> 
> Sorry, but I'm unable to remember the meaning of LOCAL, BASE and OTHER. 
> In meld, I have to scroll to the end of the filename so see the filename 
> suffix. Anyway, my real problem was different: hg opened meld with the 3 
> versions, but the BASE was already merged. I mean that hg chose for me 
> what is the right version, without letting me choose myself what is the 
> good version, because if I just close meld, I lose my local changes.
> 
> Because a merge is a new commit, I suppose that I can do something to 
> get my local changes back. But, well, I just prefer the "legacy" (?) 
> merge flavor:
> 
> <<<< local
> ...
> ===
> ...
>  >>> other
> 
> It's easier for my brain because I just have 2 versions of the same 
> code, not 3!

I prefer this as well, since I also find most merge tools unbearable.
(At some point I should probably learn emacs' ediff.)
But in some cases, you really lose information when you don't see the
base version, since in the case of contradicting changes it is very
useful to see where both came from.

Georg


From g.brandl at gmx.net  Thu Sep 29 18:21:53 2011
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 29 Sep 2011 18:21:53 +0200
Subject: [Python-Dev] Hg tips
In-Reply-To: <20110929111150.352b7be5@resist.wooz.org>
References: <CACBhJdFH8jaiEakk8FX94+W3qwiSLkyn4_U3U9_gYCde9wmyHg@mail.gmail.com>
	<4E844352.8040606@haypocalc.com>
	<20110929111150.352b7be5@resist.wooz.org>
Message-ID: <j625s8$ksk$2@dough.gmane.org>

Am 29.09.2011 17:11, schrieb Barry Warsaw:
> On Sep 29, 2011, at 12:07 PM, Victor Stinner wrote:
> 
>> I disabled the merge GUI: I lose a lot of work because I'm unable to use a
>> GUI to do merge, I don't understand what are the 3 versions of the same file
>> (which one is the merged version!?)
> 
> Emacs users should look at smerge-mode.  It has some nice keybindings and
> colorizing that usually makes resolving conflicts fairly straightforward.  It
> also will automatically `$vcs resolve` the file when you've handled all the
> conflicts.
> 
> Caveat: I use it primarily for bzr, but I think it works with most vcs's.

Yes, this is what I do as well for hg. (I had to write the "hg resolve -m"
support myself, but that was a year or two ago.  I assume it's out-of-the-box
now.)

Georg


From dickinsm at gmail.com  Thu Sep 29 19:04:38 2011
From: dickinsm at gmail.com (Mark Dickinson)
Date: Thu, 29 Sep 2011 18:04:38 +0100
Subject: [Python-Dev] [Python-checkins] cpython: Enhance
 Py_ARRAY_LENGTH(): fail at build time if the argument is not an array
In-Reply-To: <201109290345.59665.victor.stinner@haypocalc.com>
References: <E1R93IS-0005jZ-LZ@dinsdale.python.org>
	<CAPZV6o8fDKZoipB-3-r+k6pQEQsURWQJfLmBxJOMivrh-M=RkA@mail.gmail.com>
	<201109290345.59665.victor.stinner@haypocalc.com>
Message-ID: <CAAu3qLX=1J47c5fkBUSKWi8VVyYdKAOb3X8hCz5Y61FbQcqy7g@mail.gmail.com>

On Thu, Sep 29, 2011 at 2:45 AM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> I would like to suggest the opposite: move platform independdant macros from
> pyport.h to pymacro.h :-) Suggestions:
> ?- Py_ARITHMETIC_RIGHT_SHIFT
> ?- Py_FORCE_EXPANSION
> ?- Py_SAFE_DOWNCAST

Not sure about the other two, but Py_ARITHMETIC_RIGHT_SHIFT is
definitely platform dependent, which is why it's in pyport.h in the
first place.

Mark

From status at bugs.python.org  Fri Sep 30 18:07:28 2011
From: status at bugs.python.org (Python tracker)
Date: Fri, 30 Sep 2011 18:07:28 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20110930160728.D1FA41CA8F@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2011-09-23 - 2011-09-30)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    3046 (+16)
  closed 21813 (+25)
  total  24859 (+41)

Open issues with patches: 1301 


Issues opened (31)
==================

#13038: distutils windows installer STATUS_INVALID_CRUNTIME_PARAMETER 
http://bugs.python.org/issue13038  opened by mitchfrazier

#13039: IDLE editor: shell-like behaviour on line starting with ">>>"
http://bugs.python.org/issue13039  opened by etuardu

#13040: call to tkinter.messagebox.showinfo hangs the script on timer 
http://bugs.python.org/issue13040  opened by Richard86

#13041: argparse: terminal width is not detected properly
http://bugs.python.org/issue13041  opened by zbysz

#13044: pdb throws AttributeError at end of debugging session
http://bugs.python.org/issue13044  opened by akl

#13045: socket.getsockopt may require custom buffer contents
http://bugs.python.org/issue13045  opened by Artyom.Gavrichenkov

#13047: imp.find_module("") and imp.find_module(".")
http://bugs.python.org/issue13047  opened by Arfrever

#13048: Handling of paths in first argument of imp.find_module()
http://bugs.python.org/issue13048  opened by Arfrever

#13049: distutils2 should not allow a distribution to install under an
http://bugs.python.org/issue13049  opened by carljm

#13050: RLock support the context manager protocol but this is not doc
http://bugs.python.org/issue13050  opened by r.david.murray

#13051: Infinite recursion in curses.textpad.Textbox
http://bugs.python.org/issue13051  opened by tycho

#13052: IDLE: replace ending with '\' causes crash
http://bugs.python.org/issue13052  opened by terry.reedy

#13053: Add Capsule migration documentation to "cporting"
http://bugs.python.org/issue13053  opened by larry

#13054: sys.maxunicode value after PEP-393
http://bugs.python.org/issue13054  opened by ezio.melotti

#13055: Distutils tries to handle null versions but fails
http://bugs.python.org/issue13055  opened by bgamari

#13056: test_multibytecodec.py:TestStreamWriter is skipped after PEP39
http://bugs.python.org/issue13056  opened by ezio.melotti

#13057: Thread not working for python 2.7.1 built with HP Compiler on 
http://bugs.python.org/issue13057  opened by wah meng

#13059: Sporadic test_multiprocessing failure: IOError("bad message le
http://bugs.python.org/issue13059  opened by haypo

#13060: allow other rounding modes in round()
http://bugs.python.org/issue13060  opened by ArneBab

#13061: Decimal module yields incorrect results when Python compiled w
http://bugs.python.org/issue13061  opened by josharian

#13062: Introspection generator and function closure state
http://bugs.python.org/issue13062  opened by ncoghlan

#13063: test_concurrent_futures failures on Windows: IOError('[Errno 2
http://bugs.python.org/issue13063  opened by haypo

#13064: Port codecs and error handlers to the new Unicode API
http://bugs.python.org/issue13064  opened by haypo

#13070: segmentation fault in pure-python multi-threaded server
http://bugs.python.org/issue13070  opened by vsemionov

#13071: IDLE refuses to open on windows 7
http://bugs.python.org/issue13071  opened by jfalskfjdsl;akfdjsa;l.laksfj;aslkfdj;sal

#13072: Getting a buffer from a Unicode array uses invalid format
http://bugs.python.org/issue13072  opened by haypo

#13073: message_body argument of HTTPConnection.endheaders is undocume
http://bugs.python.org/issue13073  opened by petri.lehtinen

#13074: Improve documentation of locale encoding functions
http://bugs.python.org/issue13074  opened by gjb1002

#13075: PEP-0001 contains dead links
http://bugs.python.org/issue13075  opened by ezander

#13076: Bad links to 'time' in datetime documentation
http://bugs.python.org/issue13076  opened by gjb1002

#13077: Unclear behavior of daemon threads on main thread exit
http://bugs.python.org/issue13077  opened by etuardu


Most recent 15 issues with no replies (15)
==========================================

#13076: Bad links to 'time' in datetime documentation
http://bugs.python.org/issue13076

#13075: PEP-0001 contains dead links
http://bugs.python.org/issue13075

#13074: Improve documentation of locale encoding functions
http://bugs.python.org/issue13074

#13073: message_body argument of HTTPConnection.endheaders is undocume
http://bugs.python.org/issue13073

#13072: Getting a buffer from a Unicode array uses invalid format
http://bugs.python.org/issue13072

#13070: segmentation fault in pure-python multi-threaded server
http://bugs.python.org/issue13070

#13064: Port codecs and error handlers to the new Unicode API
http://bugs.python.org/issue13064

#13056: test_multibytecodec.py:TestStreamWriter is skipped after PEP39
http://bugs.python.org/issue13056

#13055: Distutils tries to handle null versions but fails
http://bugs.python.org/issue13055

#13051: Infinite recursion in curses.textpad.Textbox
http://bugs.python.org/issue13051

#13050: RLock support the context manager protocol but this is not doc
http://bugs.python.org/issue13050

#13045: socket.getsockopt may require custom buffer contents
http://bugs.python.org/issue13045

#13038: distutils windows installer STATUS_INVALID_CRUNTIME_PARAMETER 
http://bugs.python.org/issue13038

#13032: h2py.py can fail with UnicodeDecodeError
http://bugs.python.org/issue13032

#13024: cgitb uses stdout encoding
http://bugs.python.org/issue13024


Most recent 15 issues waiting for review (15)
=============================================

#13077: Unclear behavior of daemon threads on main thread exit
http://bugs.python.org/issue13077

#13061: Decimal module yields incorrect results when Python compiled w
http://bugs.python.org/issue13061

#13057: Thread not working for python 2.7.1 built with HP Compiler on 
http://bugs.python.org/issue13057

#13055: Distutils tries to handle null versions but fails
http://bugs.python.org/issue13055

#13054: sys.maxunicode value after PEP-393
http://bugs.python.org/issue13054

#13051: Infinite recursion in curses.textpad.Textbox
http://bugs.python.org/issue13051

#13045: socket.getsockopt may require custom buffer contents
http://bugs.python.org/issue13045

#13041: argparse: terminal width is not detected properly
http://bugs.python.org/issue13041

#13032: h2py.py can fail with UnicodeDecodeError
http://bugs.python.org/issue13032

#13031: small speed-up for tarfile.py when unzipping tarballs
http://bugs.python.org/issue13031

#13025: mimetypes should read the rule file using UTF-8, not the local
http://bugs.python.org/issue13025

#13024: cgitb uses stdout encoding
http://bugs.python.org/issue13024

#13018: dictobject.c: refleak
http://bugs.python.org/issue13018

#13017: pyexpat.c: refleak
http://bugs.python.org/issue13017

#13016: selectmodule.c: refleak
http://bugs.python.org/issue13016


Top 10 most discussed issues (10)
=================================

#13057: Thread not working for python 2.7.1 built with HP Compiler on 
http://bugs.python.org/issue13057  18 msgs

#13060: allow other rounding modes in round()
http://bugs.python.org/issue13060  10 msgs

#1621: Do not assume signed integer overflow behavior
http://bugs.python.org/issue1621   7 msgs

#13054: sys.maxunicode value after PEP-393
http://bugs.python.org/issue13054   7 msgs

#12242: distutils2 environment marker for current compiler
http://bugs.python.org/issue12242   5 msgs

#12806: argparse: Hybrid help text formatter
http://bugs.python.org/issue12806   5 msgs

#12966: cookielib.LWPCookieJar breaks on cookie values with a newline
http://bugs.python.org/issue12966   5 msgs

#11751: Increase distutils.filelist / packaging.manifest test coverage
http://bugs.python.org/issue11751   4 msgs

#12737: str.title() is overzealous by upcasing combining marks inappro
http://bugs.python.org/issue12737   4 msgs

#13033: Add shutil.chowntree
http://bugs.python.org/issue13033   4 msgs


Issues closed (23)
==================

#1092365: Distutils needs a way *not* to install files
http://bugs.python.org/issue1092365  closed by eric.araujo

#3130: In some UCS4 builds, sizeof(Py_UNICODE) could end up being mor
http://bugs.python.org/issue3130  closed by haypo

#8654: Improve ABI compatibility between UCS2 and UCS4 builds
http://bugs.python.org/issue8654  closed by stutzbach

#8927: Handle version incompatibilities in dependencies
http://bugs.python.org/issue8927  closed by eric.araujo

#9306: distutils: raise informative error message when cmd_class is N
http://bugs.python.org/issue9306  closed by eric.araujo

#9395: clean does not remove all temp files
http://bugs.python.org/issue9395  closed by eric.araujo

#12746: normalization is affected by unicode width
http://bugs.python.org/issue12746  closed by benjamin.peterson

#12819: PEP 393 - Flexible Unicode String Representation
http://bugs.python.org/issue12819  closed by haypo

#12981: rewrite multiprocessing (senfd|recvfd) in Python
http://bugs.python.org/issue12981  closed by neologix

#13008: syntax error when pasting valid snippet into console without e
http://bugs.python.org/issue13008  closed by eric.araujo

#13012: Allow keyword argument in str.splitlines()
http://bugs.python.org/issue13012  closed by ezio.melotti

#13013: _ctypes.c: refleak
http://bugs.python.org/issue13013  closed by meador.inge

#13035: "maintainer" value clear the "author" value when registering
http://bugs.python.org/issue13035  closed by eric.araujo

#13037: [Regression] socket.error does not inherit from IOError as doc
http://bugs.python.org/issue13037  closed by Christopher.Egner

#13042: argparse: terminal width is not detected properly
http://bugs.python.org/issue13042  closed by ezio.melotti

#13043: Unexpected behavior of imp.find_module(".") with a package pre
http://bugs.python.org/issue13043  closed by Arfrever

#13046: imp.find_module() should not find unimportable modules
http://bugs.python.org/issue13046  closed by brett.cannon

#13058: Fix file descriptor leak on error
http://bugs.python.org/issue13058  closed by neologix

#13065: test
http://bugs.python.org/issue13065  closed by vsemionov

#13066: test
http://bugs.python.org/issue13066  closed by vsemionov

#13067: test
http://bugs.python.org/issue13067  closed by vsemionov

#13068: test
http://bugs.python.org/issue13068  closed by vsemionov

#13069: test
http://bugs.python.org/issue13069  closed by ezio.melotti

From chris at simplistix.co.uk  Fri Sep 30 19:46:08 2011
From: chris at simplistix.co.uk (Chris Withers)
Date: Fri, 30 Sep 2011 18:46:08 +0100
Subject: [Python-Dev] Inconsistent script/console behaviour
In-Reply-To: <CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
References: <CAPkN8x+AcLJUQEvWVc8WtR8+MecfiNRQoA-6vAcBbuX2_BM0DQ@mail.gmail.com>
	<CAP7+vJKS4WgyqRNrtWQCfVeGa0d7A2TUxM=JOhxLJGfHJDW39g@mail.gmail.com>
Message-ID: <4E860060.1040505@simplistix.co.uk>

On 24/09/2011 00:32, Guido van Rossum wrote:
> The interactive console is optimized for people entering code by
> typing, not by copying and pasting large gobs of text.
>
> If you think you can have it both, show us the code.

Anatoly wants ipython's new qtconsole.

This "does the right thing" because it's a GUI app and so can manipulate 
the content on paste...

Not sure if you can do that in a console app...

cheers,

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
             - http://www.simplistix.co.uk